I can calculate the motion of heavenly bodies but not the madness of people. -Isaac Newton

## A Light Touch on RPy2

For a statistical analyst, the first step to start a data analysis project is to import the data into the program and then to screen the descriptive statistics of the data. In python, we can easily do so with pandas package.

In [1]: import pandas as pd

In [3]: pd.set_printoptions(precision = 5)

In [4]: print data.describe().to_string()
LEV_LT3   TAX_NDEB    COLLAT1      SIZE1      PROF2    GROWTH2        AGE        LIQ      IND2A      IND3A      IND4A      IND5A
count  4421.0000  4421.0000  4421.0000  4421.0000  4421.0000  4421.0000  4421.0000  4421.0000  4421.0000  4421.0000  4421.0000  4421.0000
mean      0.0908     0.8245     0.3174    13.5109     0.1446    13.6196    20.3664     0.2028     0.6116     0.1902     0.0269     0.0991
std       0.1939     2.8841     0.2272     1.6925     0.1109    36.5177    14.5390     0.2333     0.4874     0.3925     0.1619     0.2988
min       0.0000     0.0000     0.0000     7.7381     0.0000   -81.2476     6.0000     0.0000     0.0000     0.0000     0.0000     0.0000
25%       0.0000     0.3494     0.1241    12.3170     0.0721    -3.5632    11.0000     0.0348     0.0000     0.0000     0.0000     0.0000
50%       0.0000     0.5666     0.2876    13.5396     0.1203     6.1643    17.0000     0.1085     1.0000     0.0000     0.0000     0.0000
75%       0.0117     0.7891     0.4724    14.7511     0.1875    21.9516    25.0000     0.2914     1.0000     0.0000     0.0000     0.0000
max       0.9984   102.1495     0.9953    18.5866     1.5902   681.3542   210.0000     1.0002     1.0000     1.0000     1.0000     1.0000

Tonight, I’d like to add some spice to my python learning experience and do the work in a different flavor with rpy2 package, which allows me to call R functions from python.

In [5]: import rpy2.robjects as ro

In [7]: print ro.r.summary(rdata)
LEV_LT3           TAX_NDEB           COLLAT1           SIZE1
Min.   :0.00000   Min.   :  0.0000   Min.   :0.0000   Min.   : 7.738
1st Qu.:0.00000   1st Qu.:  0.3494   1st Qu.:0.1241   1st Qu.:12.317
Median :0.00000   Median :  0.5666   Median :0.2876   Median :13.540
Mean   :0.09083   Mean   :  0.8245   Mean   :0.3174   Mean   :13.511
3rd Qu.:0.01169   3rd Qu.:  0.7891   3rd Qu.:0.4724   3rd Qu.:14.751
Max.   :0.99837   Max.   :102.1495   Max.   :0.9953   Max.   :18.587
PROF2              GROWTH2             AGE              LIQ
Min.   :0.0000158   Min.   :-81.248   Min.   :  6.00   Min.   :0.00000
1st Qu.:0.0721233   1st Qu.: -3.563   1st Qu.: 11.00   1st Qu.:0.03483
Median :0.1203435   Median :  6.164   Median : 17.00   Median :0.10854
Mean   :0.1445929   Mean   : 13.620   Mean   : 20.37   Mean   :0.20281
3rd Qu.:0.1875148   3rd Qu.: 21.952   3rd Qu.: 25.00   3rd Qu.:0.29137
Max.   :1.5902009   Max.   :681.354   Max.   :210.00   Max.   :1.00018
IND2A            IND3A            IND4A             IND5A
Min.   :0.0000   Min.   :0.0000   Min.   :0.00000   Min.   :0.00000
1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.00000
Median :1.0000   Median :0.0000   Median :0.00000   Median :0.00000
Mean   :0.6116   Mean   :0.1902   Mean   :0.02692   Mean   :0.09907
3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.00000
Max.   :1.0000   Max.   :1.0000   Max.   :1.00000   Max.   :1.00000

As shown above, the similar analysis can be conducted by calling R functions with python. This feature enables us to extract and process the data effectively with python without losing the graphical and statistical functionality of R.