Yet Another Blog in Statistical Computing

I can calculate the motion of heavenly bodies but not the madness of people. -Isaac Newton

Run R Code Within Python On The Fly

Below is an example showing how to run R code within python, which is an extremely attractive feature for hardcore R programmers.

In [1]: import rpy2.robjects as ro

In [2]: _null_ = ro.r('data <- read.table("/home/liuwensui/data/credit_count.txt", header = TRUE, sep = ",")')

In [3]: print ro.r('str(data)')
'data.frame':	13444 obs. of  14 variables:
 $ CARDHLDR: int  0 0 1 1 1 1 1 1 1 1 ...
 $ DEFAULT : int  0 0 0 0 0 0 0 0 0 0 ...
 $ AGE     : num  27.2 40.8 37.7 42.5 21.3 ...
 $ ACADMOS : int  4 111 54 60 8 78 25 6 20 162 ...
 $ ADEPCNT : int  0 3 3 3 0 1 1 0 3 7 ...
 $ MAJORDRG: int  0 0 0 0 0 0 0 0 0 0 ...
 $ MINORDRG: int  0 0 0 0 0 0 0 0 0 0 ...
 $ OWNRENT : int  0 1 1 1 0 0 1 0 0 1 ...
 $ INCOME  : num  1200 4000 3667 2000 2917 ...
 $ SELFEMPL: int  0 0 0 0 0 0 0 0 0 0 ...
 $ INCPER  : num  18000 13500 11300 17250 35000 ...
 $ EXP_INC : num  0.000667 0.000222 0.03327 0.048427 0.016523 ...
 $ SPENDING: num  NA NA 122 96.9 48.2 ...
 $ LOGSPEND: num  NA NA 4.8 4.57 3.88 ...
NULL

In [4]: _null_ = ro.r('sample <- data[data$CARDHLDR == 1,]')

In [5]: print ro.r('summary(sample)')
    CARDHLDR    DEFAULT             AGE           ACADMOS         ADEPCNT      
 Min.   :1   Min.   :0.00000   Min.   : 0.00   Min.   :  0.0   Min.   :0.0000  
 1st Qu.:1   1st Qu.:0.00000   1st Qu.:25.75   1st Qu.: 12.0   1st Qu.:0.0000  
 Median :1   Median :0.00000   Median :31.67   Median : 30.0   Median :0.0000  
 Mean   :1   Mean   :0.09487   Mean   :33.67   Mean   : 55.9   Mean   :0.9904  
 3rd Qu.:1   3rd Qu.:0.00000   3rd Qu.:39.75   3rd Qu.: 72.0   3rd Qu.:2.0000  
 Max.   :1   Max.   :1.00000   Max.   :88.67   Max.   :564.0   Max.   :9.0000  
    MAJORDRG         MINORDRG         OWNRENT           INCOME    
 Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :  50  
 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:1750  
 Median :0.0000   Median :0.0000   Median :0.0000   Median :2292  
 Mean   :0.1433   Mean   :0.2207   Mean   :0.4791   Mean   :2606  
 3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:3042  
 Max.   :6.0000   Max.   :7.0000   Max.   :1.0000   Max.   :8333  
    SELFEMPL           INCPER          EXP_INC            SPENDING       
 Min.   :0.00000   Min.   :   700   Min.   :0.000096   Min.   :   0.111  
 1st Qu.:0.00000   1st Qu.: 12900   1st Qu.:0.025998   1st Qu.:  58.753  
 Median :0.00000   Median : 20000   Median :0.058957   Median : 139.992  
 Mean   :0.05362   Mean   : 22581   Mean   :0.090744   Mean   : 226.983  
 3rd Qu.:0.00000   3rd Qu.: 28337   3rd Qu.:0.116123   3rd Qu.: 284.440  
 Max.   :1.00000   Max.   :150000   Max.   :2.037728   Max.   :4810.309  
    LOGSPEND     
 Min.   :-2.197  
 1st Qu.: 4.073  
 Median : 4.942  
 Mean   : 4.729  
 3rd Qu.: 5.651  
 Max.   : 8.479  

In [6]: print ro.r('summary(glm(DEFAULT ~ MAJORDRG + MINORDRG + OWNRENT + INCOME, data = sample, family = binomial))')

Call:
glm(formula = DEFAULT ~ MAJORDRG + MINORDRG + OWNRENT + INCOME, 
    family = binomial, data = sample)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.9587  -0.5003  -0.4351  -0.3305   3.1928  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.204e+00  9.084e-02 -13.259  < 2e-16 ***
MAJORDRG     2.031e-01  6.926e-02   2.933  0.00336 ** 
MINORDRG     2.027e-01  4.798e-02   4.225 2.38e-05 ***
OWNRENT     -2.012e-01  7.163e-02  -2.809  0.00496 ** 
INCOME      -4.422e-04  4.044e-05 -10.937  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 6586.1  on 10498  degrees of freedom
Residual deviance: 6376.2  on 10494  degrees of freedom
AIC: 6386.2

Number of Fisher Scoring iterations: 6
Advertisements

Written by statcompute

November 24, 2012 at 11:19 pm

%d bloggers like this: