Yet Another Blog in Statistical Computing

I can calculate the motion of heavenly bodies but not the madness of people. -Isaac Newton

A Light Touch on RPy2

For a statistical analyst, the first step to start a data analysis project is to import the data into the program and then to screen the descriptive statistics of the data. In python, we can easily do so with pandas package.

In [1]: import pandas as pd

In [2]: data = pd.read_table("/home/liuwensui/Documents/data/csdata.txt", header = 0)

In [3]: pd.set_printoptions(precision = 5)

In [4]: print data.describe().to_string()
         LEV_LT3   TAX_NDEB    COLLAT1      SIZE1      PROF2    GROWTH2        AGE        LIQ      IND2A      IND3A      IND4A      IND5A
count  4421.0000  4421.0000  4421.0000  4421.0000  4421.0000  4421.0000  4421.0000  4421.0000  4421.0000  4421.0000  4421.0000  4421.0000
mean      0.0908     0.8245     0.3174    13.5109     0.1446    13.6196    20.3664     0.2028     0.6116     0.1902     0.0269     0.0991
std       0.1939     2.8841     0.2272     1.6925     0.1109    36.5177    14.5390     0.2333     0.4874     0.3925     0.1619     0.2988
min       0.0000     0.0000     0.0000     7.7381     0.0000   -81.2476     6.0000     0.0000     0.0000     0.0000     0.0000     0.0000
25%       0.0000     0.3494     0.1241    12.3170     0.0721    -3.5632    11.0000     0.0348     0.0000     0.0000     0.0000     0.0000
50%       0.0000     0.5666     0.2876    13.5396     0.1203     6.1643    17.0000     0.1085     1.0000     0.0000     0.0000     0.0000
75%       0.0117     0.7891     0.4724    14.7511     0.1875    21.9516    25.0000     0.2914     1.0000     0.0000     0.0000     0.0000
max       0.9984   102.1495     0.9953    18.5866     1.5902   681.3542   210.0000     1.0002     1.0000     1.0000     1.0000     1.0000

Tonight, I’d like to add some spice to my python learning experience and do the work in a different flavor with rpy2 package, which allows me to call R functions from python.

 
In [5]: import rpy2.robjects as ro

In [6]: rdata = ro.packages.importr('utils').read_table("/home/liuwensui/Documents/data/csdata.txt", header = True)

In [7]: print ro.r.summary(rdata)
    LEV_LT3           TAX_NDEB           COLLAT1           SIZE1       
 Min.   :0.00000   Min.   :  0.0000   Min.   :0.0000   Min.   : 7.738  
 1st Qu.:0.00000   1st Qu.:  0.3494   1st Qu.:0.1241   1st Qu.:12.317  
 Median :0.00000   Median :  0.5666   Median :0.2876   Median :13.540  
 Mean   :0.09083   Mean   :  0.8245   Mean   :0.3174   Mean   :13.511  
 3rd Qu.:0.01169   3rd Qu.:  0.7891   3rd Qu.:0.4724   3rd Qu.:14.751  
 Max.   :0.99837   Max.   :102.1495   Max.   :0.9953   Max.   :18.587  
     PROF2              GROWTH2             AGE              LIQ         
 Min.   :0.0000158   Min.   :-81.248   Min.   :  6.00   Min.   :0.00000  
 1st Qu.:0.0721233   1st Qu.: -3.563   1st Qu.: 11.00   1st Qu.:0.03483  
 Median :0.1203435   Median :  6.164   Median : 17.00   Median :0.10854  
 Mean   :0.1445929   Mean   : 13.620   Mean   : 20.37   Mean   :0.20281  
 3rd Qu.:0.1875148   3rd Qu.: 21.952   3rd Qu.: 25.00   3rd Qu.:0.29137  
 Max.   :1.5902009   Max.   :681.354   Max.   :210.00   Max.   :1.00018  
     IND2A            IND3A            IND4A             IND5A        
 Min.   :0.0000   Min.   :0.0000   Min.   :0.00000   Min.   :0.00000  
 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.00000  
 Median :1.0000   Median :0.0000   Median :0.00000   Median :0.00000  
 Mean   :0.6116   Mean   :0.1902   Mean   :0.02692   Mean   :0.09907  
 3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.00000  
 Max.   :1.0000   Max.   :1.0000   Max.   :1.00000   Max.   :1.00000  

As shown above, the similar analysis can be conducted by calling R functions with python. This feature enables us to extract and process the data effectively with python without losing the graphical and statistical functionality of R.

Advertisements

Written by statcompute

November 24, 2012 at 12:31 am

Posted in PYTHON, S+/R, Statistics

%d bloggers like this: