Yet Another Blog in Statistical Computing

I can calculate the motion of heavenly bodies but not the madness of people. -Isaac Newton

Archive for March 2014

Estimating GLM with Julia

using DataFrames, GLM

df1 = readtable("credit_count.txt");

df2 = df1[df1[:CARDHLDR] .== 1, [:DEFAULT, :MAJORDRG, :MINORDRG, :INCOME, :OWNRENT]];

mdl = glm(DEFAULT ~ MAJORDRG + MINORDRG + INCOME + OWNRENT, df2, Binomial());

print(mdl);
# Coefficients:
#                  Estimate  Std.Error  z value Pr(>|z|)
# (Intercept)      -1.20444  0.0908218 -13.2616  < eps()
# MAJORDRG         0.203135  0.0692537  2.93319   0.0034
# MINORDRG         0.202727  0.0479741  4.22575   2.4e-5
# INCOME       -0.000442229 4.04222e-5 -10.9402  < eps()
# OWNRENT         -0.201223  0.0716217 -2.80953    0.005

print(deviance(mdl));
# 6376.220859525586
Advertisements

Written by statcompute

March 8, 2014 at 7:13 pm

Learning Pig Latin on 2014-03-01

grunt> sh head -4 2008.csv;
Year,Month,DayofMonth,DayOfWeek,DepTime,CRSDepTime,ArrTime,CRSArrTime,UniqueCarrier,FlightNum,TailNum,ActualElapsedTime,CRSElapsedTime,AirTime,ArrDelay,DepDelay,Origin,Dest,Distance,TaxiIn,TaxiOut,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay
2008,1,3,4,2003,1955,2211,2225,WN,335,N712SW,128,150,116,-14,8,IAD,TPA,810,4,8,0,,0,NA,NA,NA,NA,NA
2008,1,3,4,754,735,1002,1000,WN,3231,N772SW,128,145,113,2,19,IAD,TPA,810,5,10,0,,0,NA,NA,NA,NA,NA
2008,1,3,4,628,620,804,750,WN,448,N428WN,96,90,76,14,8,IND,BWI,515,3,17,0,,0,NA,NA,NA,NA,NA

grunt> sh sed '1d' 2008.csv > 2008.csv2;

grunt> sh head -4 2008.csv2;            
2008,1,3,4,2003,1955,2211,2225,WN,335,N712SW,128,150,116,-14,8,IAD,TPA,810,4,8,0,,0,NA,NA,NA,NA,NA
2008,1,3,4,754,735,1002,1000,WN,3231,N772SW,128,145,113,2,19,IAD,TPA,810,5,10,0,,0,NA,NA,NA,NA,NA
2008,1,3,4,628,620,804,750,WN,448,N428WN,96,90,76,14,8,IND,BWI,515,3,17,0,,0,NA,NA,NA,NA,NA
2008,1,3,4,926,930,1054,1100,WN,1746,N612SW,88,90,78,-6,-4,IND,BWI,515,3,7,0,,0,NA,NA,NA,NA,NA

grunt> A = LOAD '2008.csv2' USING PigStorage(',');                

grunt> B = GROUP A BY ($0, $1); 

grunt> C = FOREACH B GENERATE group, COUNT(A);

grunt> D = FILTER C BY $0.$1 IN (1, 2, 3);

grunt> SPLIT D INTO D1 IF $0.$1 == 1, D2 IF $0.$1 == 2, D3 IF $0.$1 == 3;

grunt> dump D3;
((2008,3),616090)

Written by statcompute

March 1, 2014 at 11:05 pm

Posted in Big Data, Pig Latin