Yet Another Blog in Statistical Computing

"Did you always know?" "No, I didn't. But I believed."

Rmagic, A Handy Interface Bridging Python and R

Rmagic (http://ipython.org/ipython-doc/dev/config/extensions/rmagic.html) is the ipython extension that utilizes rpy2 in the back-end and provides a convenient interface accessing R from ipython. Compared with the generic use of rpy2, the rmagic extension allows users to exchange objects between ipython and R in a more flexible way and to run a single R function or a block of R code conveniently.

Below is an example demonstrating a simple use case how to push a pandas DataFrame object into R, convert it to a R data.frame, and then transfer back to a new pandas DataFrame object again.

In [1]: import pandas as pd

In [2]: # READ DATA INTO PANDAS DATAFRAME

In [3]: pydf1 = pd.read_table('../data/csdata.txt', header = 0)

In [4]: print pydf1.describe()
           LEV_LT3     TAX_NDEB      COLLAT1        SIZE1        PROF2  \
count  4421.000000  4421.000000  4421.000000  4421.000000  4421.000000   
mean      0.090832     0.824537     0.317354    13.510870     0.144593   
std       0.193872     2.884129     0.227150     1.692520     0.110908   
min       0.000000     0.000000     0.000000     7.738052     0.000016   
25%       0.000000     0.349381     0.124094    12.316970     0.072123   
50%       0.000000     0.566577     0.287613    13.539574     0.120344   
75%       0.011689     0.789128     0.472355    14.751119     0.187515   
max       0.998372   102.149483     0.995346    18.586632     1.590201   

           GROWTH2          AGE          LIQ        IND2A        IND3A  \
count  4421.000000  4421.000000  4421.000000  4421.000000  4421.000000   
mean     13.619633    20.366433     0.202813     0.611626     0.190228   
std      36.517739    14.538997     0.233256     0.487435     0.392526   
min     -81.247627     6.000000     0.000000     0.000000     0.000000   
25%      -3.563235    11.000000     0.034834     0.000000     0.000000   
50%       6.164303    17.000000     0.108544     1.000000     0.000000   
75%      21.951632    25.000000     0.291366     1.000000     0.000000   
max     681.354187   210.000000     1.000182     1.000000     1.000000   

             IND4A        IND5A  
count  4421.000000  4421.000000  
mean      0.026917     0.099073  
std       0.161859     0.298793  
min       0.000000     0.000000  
25%       0.000000     0.000000  
50%       0.000000     0.000000  
75%       0.000000     0.000000  
max       1.000000     1.000000  

In [5]: # CONVERT PANDAS DATAFRAME TO R DATA.FRAME

In [6]: %load_ext rmagic

In [7]: col = pydf1.columns

In [8]: %R -i pydf1,col colnames(pydf1) <- unlist(col); print(is.matrix(pydf1))
[1] TRUE

In [9]: %R rdf <- data.frame(pydf1); print(is.data.frame(rdf))
[1] TRUE

In [10]: %R print(summary(rdf))
    LEV_LT3           TAX_NDEB           COLLAT1           SIZE1       
 Min.   :0.00000   Min.   :  0.0000   Min.   :0.0000   Min.   : 7.738  
 1st Qu.:0.00000   1st Qu.:  0.3494   1st Qu.:0.1241   1st Qu.:12.317  
 Median :0.00000   Median :  0.5666   Median :0.2876   Median :13.540  
 Mean   :0.09083   Mean   :  0.8245   Mean   :0.3174   Mean   :13.511  
 3rd Qu.:0.01169   3rd Qu.:  0.7891   3rd Qu.:0.4724   3rd Qu.:14.751  
 Max.   :0.99837   Max.   :102.1495   Max.   :0.9953   Max.   :18.587  
     PROF2              GROWTH2             AGE              LIQ         
 Min.   :0.0000158   Min.   :-81.248   Min.   :  6.00   Min.   :0.00000  
 1st Qu.:0.0721233   1st Qu.: -3.563   1st Qu.: 11.00   1st Qu.:0.03483  
 Median :0.1203435   Median :  6.164   Median : 17.00   Median :0.10854  
 Mean   :0.1445929   Mean   : 13.620   Mean   : 20.37   Mean   :0.20281  
 3rd Qu.:0.1875148   3rd Qu.: 21.952   3rd Qu.: 25.00   3rd Qu.:0.29137  
 Max.   :1.5902009   Max.   :681.354   Max.   :210.00   Max.   :1.00018  
     IND2A            IND3A            IND4A             IND5A        
 Min.   :0.0000   Min.   :0.0000   Min.   :0.00000   Min.   :0.00000  
 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.00000  
 Median :1.0000   Median :0.0000   Median :0.00000   Median :0.00000  
 Mean   :0.6116   Mean   :0.1902   Mean   :0.02692   Mean   :0.09907  
 3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.00000  
 Max.   :1.0000   Max.   :1.0000   Max.   :1.00000   Max.   :1.00000  

In [11]: # CONVER R DATA.FRAME BACK TO PANDAS DATAFRAME

In [12]: %R -d rdf

In [13]: pydf2 = pd.DataFrame(rdf)

In [14]: print pydf2.describe()
           LEV_LT3     TAX_NDEB      COLLAT1        SIZE1        PROF2  \
count  4421.000000  4421.000000  4421.000000  4421.000000  4421.000000   
mean      0.090832     0.824537     0.317354    13.510870     0.144593   
std       0.193872     2.884129     0.227150     1.692520     0.110908   
min       0.000000     0.000000     0.000000     7.738052     0.000016   
25%       0.000000     0.349381     0.124094    12.316970     0.072123   
50%       0.000000     0.566577     0.287613    13.539574     0.120344   
75%       0.011689     0.789128     0.472355    14.751119     0.187515   
max       0.998372   102.149483     0.995346    18.586632     1.590201   

           GROWTH2          AGE          LIQ        IND2A        IND3A  \
count  4421.000000  4421.000000  4421.000000  4421.000000  4421.000000   
mean     13.619633    20.366433     0.202813     0.611626     0.190228   
std      36.517739    14.538997     0.233256     0.487435     0.392526   
min     -81.247627     6.000000     0.000000     0.000000     0.000000   
25%      -3.563235    11.000000     0.034834     0.000000     0.000000   
50%       6.164303    17.000000     0.108544     1.000000     0.000000   
75%      21.951632    25.000000     0.291366     1.000000     0.000000   
max     681.354187   210.000000     1.000182     1.000000     1.000000   

             IND4A        IND5A  
count  4421.000000  4421.000000  
mean      0.026917     0.099073  
std       0.161859     0.298793  
min       0.000000     0.000000  
25%       0.000000     0.000000  
50%       0.000000     0.000000  
75%       0.000000     0.000000  
max       1.000000     1.000000
About these ads

Written by statcompute

May 31, 2013 at 10:25 pm

Posted in PYTHON, S+/R

Follow

Get every new post delivered to your Inbox.

Join 68 other followers

%d bloggers like this: