Decision Tree with Python
Below is a piece of python code snippet that I tried to build a decision tree classifier with sklearn package (http://scikit-learn.org). While the classifier is easy to specified, it took me a while to figure out how to tweak DOT language scripts (http://en.wikipedia.org/wiki/DOT_language) and to visualize the tree diagram in a presentable way. Anyhow, here it is.
from sklearn import tree from pandas import * data = read_table('/home/liuwensui/Documents/data/credit_count.txt', sep = ',') Y = data[data.CARDHLDR == 1].BAD X = data[data.CARDHLDR == 1][['AGE', 'ADEPCNT', 'MAJORDRG', 'MINORDRG', 'INCOME', 'OWNRENT']] clf = tree.DecisionTreeClassifier(min_samples_leaf = 500) clf = clf.fit(X, Y) from StringIO import StringIO out = StringIO() out = tree.export_graphviz(clf, out_file = out) # OUTPUT DOT LANGUAGE SCRIPTS print out.getvalue()