Yet Another Blog in Statistical Computing

I can calculate the motion of heavenly bodies but not the madness of people. -Isaac Newton

Decision Tree with Python

Below is a piece of python code snippet that I tried to build a decision tree classifier with sklearn package (http://scikit-learn.org). While the classifier is easy to specified, it took me a while to figure out how to tweak DOT language scripts (http://en.wikipedia.org/wiki/DOT_language) and to visualize the tree diagram in a presentable way. Anyhow, here it is.

from sklearn import tree
from pandas import *
data = read_table('/home/liuwensui/Documents/data/credit_count.txt', sep = ',')
Y = data[data.CARDHLDR == 1].BAD
X = data[data.CARDHLDR == 1][['AGE', 'ADEPCNT', 'MAJORDRG', 'MINORDRG', 'INCOME', 'OWNRENT']]
clf = tree.DecisionTreeClassifier(min_samples_leaf = 500)
clf = clf.fit(X, Y)
from StringIO import StringIO
out = StringIO()
out = tree.export_graphviz(clf, out_file = out)
# OUTPUT DOT LANGUAGE SCRIPTS
print out.getvalue()

tree

Advertisements

Written by statcompute

December 5, 2012 at 12:02 am

%d bloggers like this: