Yet Another Blog in Statistical Computing

I can calculate the motion of heavenly bodies but not the madness of people. -Isaac Newton

Test Drive of PyPy

PyPy (www.pypy.org) is a python interpreter alternative to CPython. Today, I did a test drive with PyPy. The python code below is to generate 1,000,000 rows of data and then calculate the aggregated summation of each category.

import random  as random
import pprint  as pprint

random.seed(2013)
list = [[random.randint(1, 2), random.randint(1, 3), random.uniform(-1, 1)] for i in xrange(1, 1000001)]

summ = []
 
for i in set(x[0] for x in list):
  for j in set(x[1] for x in list):
    total = sum([x[2] for x in list if x[0] == i and x[1] == j]) 
    summ.append([i, j, round(total, 2)])

pprint.pprint(summ)

First of all, I run it with CPython and get the follow outcome.

liuwensui@ubuntu64:~/Documents/code$ time python test_pypy.py 
[[1, 1, 258.53],
 [1, 2, -209.49],
 [1, 3, -225.78],
 [2, 1, 202.2],
 [2, 2, 58.63],
 [2, 3, 188.99]]

real    0m4.491s
user    0m4.448s
sys     0m0.040s

However, if I run it with PyPy, the run time is only about one third of the one with CPython interpreter.

liuwensui@ubuntu64:~/Documents/code$ time pypy test_pypy.py 
[[1, 1, 258.53],
 [1, 2, -209.49],
 [1, 3, -225.78],
 [2, 1, 202.2],
 [2, 2, 58.63],
 [2, 3, 188.99]]

real    0m1.351s
user    0m1.228s
sys     0m0.120s
Advertisements

Written by statcompute

September 15, 2013 at 4:57 pm

Posted in Big Data, PYTHON

%d bloggers like this: