Today, I spent more time on how to specify and visualize decision tree classifier in scikit-learning package and finally have a better understanding. With some tweaking, sklearn.tree module works pretty well with pandas package that I am actively learning. Below is a piece of revised code that is close to what we could use in real-world problems.
In [1]: # LOAD PACKAGES In [2]: from sklearn import tree In [3]: from pandas import read_table, DataFrame In [4]: from os import system In [5]: # IMPORT DATA In [6]: data = read_table('/home/liuwensui/Documents/data/credit_count.txt', sep = ',') In [7]: # DEFINE THE RESPONSE In [8]: Y = data[data.CARDHLDR == 1].BAD In [9]: # DEFINE PREDICTORS In [10]: X = data.ix[data.CARDHLDR == 1, "AGE":"EXP_INC"] In [11]: # SPECIFY TREE CLASSIFIER In [12]: dtree = tree.DecisionTreeClassifier(criterion = "entropy", min_samples_leaf = 500, compute_importances = True) In [13]: dtree = dtree.fit(X, Y) In [14]: # PRINT OUT VARIABLE IMPORTANCE In [15]: print DataFrame(dtree.feature_importances_, columns = ["Imp"], index = X.columns).sort(['Imp'], ascending = False) Imp INCOME 0.509823 INCPER 0.174509 AGE 0.099996 EXP_INC 0.086134 ACADMOS 0.070118 MINORDRG 0.059420 ADEPCNT 0.000000 MAJORDRG 0.000000 OWNRENT 0.000000 SELFEMPL 0.000000 In [16]: # OUTPUT DOT LANGUAGE SCRIPT In [17]: dotfile = open("/home/liuwensui/Documents/code/dtree2.dot", 'w') In [18]: dotfile = tree.export_graphviz(dtree, out_file = dotfile, feature_names = X.columns) In [19]: dotfile.close() In [20]: # CALL SYSTEM TO DRAW THE GRAPH In [21]: system("dot -Tpng /home/liuwensui/Documents/code/dtree2.dot -o /home/liuwensui/Documents/code/dtree2.png") Out[21]: 0
You must be logged in to post a comment.