## The Power of Decision Stumps

A decision stump is the weak classification model with the simple tree structure consisting of one split, which can also be considered a one-level decision tree. Due to its simplicity, the stump often demonstrates a low predictive performance. As shown in the example below, the AUC measure of a stump is even lower than the one of a single attribute in a separate testing dataset.

pkgs <- c('pROC', 'RWeka') lapply(pkgs, require, character.only = T) df1 <- read.csv("credit_count.txt") df2 <- df1[df1$CARDHLDR == 1, ] set.seed(2016) n <- nrow(df2) sample <- sample(seq(n), size = n / 2, replace = FALSE) train <- df2[sample, ] test <- df2[-sample, ] x <- paste("AGE + ACADMOS + ADEPCNT + MAJORDRG + MINORDRG + OWNRENT + INCOME + SELFEMPL + INCPER + EXP_INC") fml <- as.formula(paste("as.factor(DEFAULT) ~ ", x)) ### IDENTIFY THE MOST PREDICTIVE ATTRIBUTE ### imp <- InfoGainAttributeEval(fml, data = train) imp_x <- test[, names(imp[imp == max(imp)])] roc(as.factor(test$DEFAULT), imp_x) # Area under the curve: 0.6243 ### CONSTRUCT A WEAK CLASSIFIER OF DECISION STUMP ### stump <- DecisionStump(fml, data = train) print(stump) roc(as.factor(test$DEFAULT), predict(stump, newdata = test, type = "probability")[, 2]) # Area under the curve: 0.5953

Albeit weak by itself, the decision stump can be used as a base model in many machine learning ensemble methods, such as bagging and boosting. For instance, the bagging classifier with 1,000 stumps combined outperforms the single stump by ~7% in terms of AUC (0.6346 vs. 0.5953). Moreover, AdaBoost with stumps can further improve the predictive performance over the single stump by ~11% (0.6585 vs. 0.5953) and also over the logistic regression benchmark by ~2% (0.6585 vs. 0.6473).

### BUILD A BAGGING CLASSIFIER WITH 1,000 STUMPS IN PARALLEL ### bagging <- Bagging(fml, data = train, control = Weka_control("num-slots" = 0, I = 1000, W = "DecisionStump", S = 2016, P = 50)) roc(as.factor(test$DEFAULT), predict(bagging, newdata = test, type = "probability")[, 2]) # Area under the curve: 0.6346 ### BUILD A BOOSTING CLASSIFIER WITH STUMPS ### boosting <- AdaBoostM1(fml, data = train, control = Weka_control(I = 100, W = "DecisionStump", S = 2016)) roc(as.factor(test$DEFAULT), predict(boosting, newdata = test, type = "probability")[, 2]) # Area under the curve: 0.6585 ### DEVELOP A LOGIT MODEL FOR THE BENCHMARK ### logit <- Logistic(fml, data = train) roc(as.factor(test$DEFAULT), predict(logit, newdata = test, type = "probability")[, 2]) # Area under the curve: 0.6473