I can calculate the motion of heavenly bodies but not the madness of people. -Isaac Newton

## Data Import Efficiency

HDF5 stands for hierarchical data format and is a popular data model designed to store and organize large amounts of data with interfaces in multiple computing languages such as R, PYTHON, and MATLAB.

Below is a demonstration showing the efficiency of data import from HDF5 and its comparison with CSV and SQLITE. As shown in the result, the time of data import from HDF5 is the shortest, only ~50% of import time from CSV and ~25% of import time from SQLITE.

```In [1]: import pandas as pd

In [2]: import pandas.io.sql as pd_sql

In [3]: import sqlite3 as sql

Out[5]:
0         0    0  27.250000        4        0         0         0        0  1200.000000
1         0    0  40.833332      111        3         0         0        1  4000.000000
2         1    0  37.666668       54        3         0         0        1  3666.666667

SELFEMPL  INCPER   EXP_INC     SPENDING   LOGSPEND
0         0   18000  0.000667
1         0   13500  0.000222
2         0   11300  0.033270  121.9896773  4.8039364

In [6]: con = sql.connect('data.db') #WRITE DF INTO SQLITE DB

In [7]: pd_sql.write_frame(DF, "tbl", con)

In [8]: con.commit()

In [9]: h5 = pd.HDFStore('data.h5', 'w') #WRITE DF INTO HDF5

In [10]: h5['tbl'] = DF

In [11]: %timeit -n100 -r10 pd.read_csv('credit_count.csv')
100 loops, best of 10: 64.3 ms per loop

In [12]: %timeit -n100 -r10 pd_sql.read_frame("select * from tbl", con)
100 loops, best of 10: 114 ms per loop

In [13]: %timeit -n100 -r10 output = h5['tbl']
100 loops, best of 10: 26.3 ms per loop
```