Yet Another Blog in Statistical Computing

I can calculate the motion of heavenly bodies but not the madness of people. -Isaac Newton

Data Import Efficiency – A Case in R

Below is a piece of R snippet comparing the data import efficiencies among CSV, SQLITE, and HDF5. Similar to the case in Python posted yesterday, HDF5 shows the highest efficiency.

> library(RSQLite)
Loading required package: DBI
> library(rhdf5) 
> df <- read.csv('credit_count.csv')
> do.call(cat, list(nrow(df), ncol(df), '\n'))
13444 14 
> 
> # WRITE DF INTO SQLITE
> if(file.exists('data.db')) file.remove('data.db')
[1] TRUE
> con <- dbConnect("SQLite", dbname = "data.db")
> dbWriteTable(con, "tbl", df)
[1] TRUE
> 
> # WRITE DF INTO HDF5
> if(file.exists('data.h5')) file.remove('data.h5')
[1] TRUE
> h5createFile("data.h5")
[1] TRUE
> h5write(df, 'data.h5', 'tbl')
> 
> # CALCULATE CPU TIMES
> system.time(for(i in 1:10) read.csv('credit_count.csv'))
   user  system elapsed 
  1.148   0.056   1.576 
> system.time(for(i in 1:10) dbReadTable(con, 'tbl'))
   user  system elapsed 
  0.492   0.024   0.649 
> system.time(for(i in 1:10) h5read('data.h5','tbl'))
   user  system elapsed 
  0.164   1.184   1.946 
Advertisements

Written by statcompute

December 23, 2012 at 5:22 pm

Posted in S+/R

Tagged with ,

%d bloggers like this: