Ibis – A New Kid in Town
Developed by Wes McKinney, pandas is a very efficient and powerful data analysis tool in python language for data scientists. Same as R, pandas reads the data into memory. As a result, we might often face the problem of running out of memory while analyzing large-size data with pandas.
Similar to Blaze, ibis is a new data analysis framework in python built on top of other back-end data engines, such as sqlite and impala. Even better, ibis provides a higher compatibility to pandas and better performance than Blaze.
In a previous blog (https://statcompute.wordpress.com/2015/03/27/a-comparison-between-blaze-and-pandas), I’ve shown the efficiency of Blaze through a simple example. However, in the demonstration below, it is shown that, while applied to the same data with sqlite engine, ibis is 50% more efficient than Blaze in terms of the “real time”.
import ibis as ibis tbl = ibis.sqlite.connect('//home/liuwensui/Documents/data/flights.db').table('tbl2008') exp = tbl[tbl.DayOfWeek > 1].group_by("DayOfWeek").aggregate(avg_AirTime = tbl.AirTime.mean()) pd = exp.execute() print(pd) #i DayOfWeek avg_AirTime #0 2 103.214930 #1 3 103.058508 #2 4 103.467138 #3 5 103.557539 #4 6 107.400631 #5 7 104.864885 # #real 0m10.346s #user 0m9.585s #sys 0m1.181s