GNU Octave (www.gnu.org/software/octave) is a matrix-based computing language most compatible with Matlab and is gaining popularity in Machine Learning after Dr Ng’s open course (www.coursera.org/course/ml). It is not a general-purpose programming language and is primarily designed for numerical computation. As a result, it is usually not convenient to do dirty works such as data munging within octave. In this case, python becomes a good compliment.
Oct2Py is a python interface to octave, which allows users to run octave commands and m-files dynamically within python and to exchange data between two computing systems seamlessly.
In the example below, I will show how to do simple data wrangling with oct2py package in python, convert python numpy array to octave binary *.mat file, load the data in *.mat file into octave, and then estimate a logistic regression through IRLS (iteratively reweighted least squares) method. As shown in octave session below, the vectorized implementation makes octave very efficient in prototyping machine learning algorithm, i.e. logistic regression in this case.
PYTHON SESSION
In [1]: import pandas as pd In [2]: import numpy as np In [3]: import oct2py as op In [4]: # READ DATA FROM CSV In [5]: df = pd.read_csv('credit_count.csv') In [6]: # SCRUB RAW DATA In [7]: df2 = df.ix[df.CARDHLDR == 1, ['AGE', 'ADEPCNT', 'MAJORDRG', 'MINORDRG', 'INCOME', 'OWNRENT', 'BAD']] In [8]: # CONVERT DATAFRAME TO NUMPY ARRAY In [9]: arr = np.array(df2, dtype = np.double) In [10]: # INVOKE AN OCTAVE SESSION In [11]: octave = op.Oct2Py() In [12]: # PUT NUMPY ARRAY INTO OCTAVE WORKSPACE In [13]: octave.put('m', arr) In [14]: print octave.run('whos') Variables in the current scope: Attr Name Size Bytes Class ==== ==== ==== ===== ===== ans 1x70 763 cell m 10499x7 587944 double Total is 73563 elements using 588707 bytes In [15]: # SAVE MAT DATA TO BE USED IN OCTAVE In [16]: octave.run('save data.mat m') Out[16]: ''
OCTAVE SESSION
octave:1> % LOAD DATA octave:1> load data.mat octave:2> whos Variables in the current scope: Attr Name Size Bytes Class ==== ==== ==== ===== ===== ans 1x70 763 cell m 10499x7 587944 double Total is 73563 elements using 588707 bytes octave:3> x = [ones(size(m, 1), 1) m(:, 1:6)]; octave:4> y = m(:, 7); octave:5> % SET INITIAL VALUES OF PARAMETERS octave:5> b = zeros(size(x, 2), 1); octave:6> % MAX ITERATIONS octave:6> maxiter = 100; octave:7> % SOLVE FOR BETA THROUGH IRLS octave:7> for i = 1:maxiter > xb = x * b; > p = 1 ./ (1 + exp(-xb)); > w = diag(p .* (1 - p)); > z = x * b + w \ (y - p); > b = (x' * w * x) \ (x' * w) * z; > new_ll = y' * log(p) + (1 - y)' * log(1 - p); > disp(sprintf('log likelihood for iteration %0.0f: %0.8f', i, new_ll)); > if i == 1 > old_ll = new_ll; > elseif new_ll > old_ll > old_ll = new_ll; > else > break > endif > end log likelihood for iteration 1: -7277.35224870 log likelihood for iteration 2: -3460.32494800 log likelihood for iteration 3: -3202.91405878 log likelihood for iteration 4: -3176.95671906 log likelihood for iteration 5: -3175.85136112 log likelihood for iteration 6: -3175.84836320 log likelihood for iteration 7: -3175.84836317 log likelihood for iteration 8: -3175.84836317 log likelihood for iteration 9: -3175.84836317 octave:8> % CALCULATE STANDARD ERRORS AND Z-VALUES octave:8> stder = sqrt(diag(inv(x' * w * x))); octave:9> z = b ./ stder; octave:10> result = [b, stder, z]; octave:11> format short g; octave:12> disp(result); -0.96483 0.13317 -7.2453 -0.0094622 0.0038629 -2.4495 0.13384 0.02875 4.6554 0.21028 0.069724 3.0159 0.20073 0.048048 4.1776 -0.0004632 4.1893e-05 -11.057 -0.22627 0.077392 -2.9237
You must be logged in to post a comment.