The Power of Decision Stumps

A decision stump is the weak classification model with the simple tree structure consisting of one split, which can also be considered a one-level decision tree. Due to its simplicity, the stump often demonstrates a low predictive performance. As shown in the example below, the AUC measure of a stump is even lower than the one of a single attribute in a separate testing dataset.

pkgs <- c('pROC', 'RWeka')
lapply(pkgs, require, character.only = T)
df1 <- read.csv("credit_count.txt")
df2 <- df1[df1$CARDHLDR == 1, ]
set.seed(2016)
n <- nrow(df2)
sample <- sample(seq(n), size = n / 2, replace = FALSE)
train <- df2[sample, ]
test <- df2[-sample, ]
x <- paste("AGE + ACADMOS + ADEPCNT + MAJORDRG + MINORDRG + OWNRENT + INCOME + SELFEMPL + INCPER + EXP_INC")
fml <- as.formula(paste("as.factor(DEFAULT) ~ ", x))

### IDENTIFY THE MOST PREDICTIVE ATTRIBUTE ###
imp <- InfoGainAttributeEval(fml, data = train)
imp_x <- test[, names(imp[imp == max(imp)])]
roc(as.factor(test$DEFAULT), imp_x)
# Area under the curve: 0.6243

### CONSTRUCT A WEAK CLASSIFIER OF DECISION STUMP ###
stump <- DecisionStump(fml, data = train)
print(stump)
roc(as.factor(test$DEFAULT), predict(stump, newdata = test, type = "probability")[, 2])
# Area under the curve: 0.5953

Albeit weak by itself, the decision stump can be used as a base model in many machine learning ensemble methods, such as bagging and boosting. For instance, the bagging classifier with 1,000 stumps combined outperforms the single stump by ~7% in terms of AUC (0.6346 vs. 0.5953). Moreover, AdaBoost with stumps can further improve the predictive performance over the single stump by ~11% (0.6585 vs. 0.5953) and also over the logistic regression benchmark by ~2% (0.6585 vs. 0.6473).

### BUILD A BAGGING CLASSIFIER WITH 1,000 STUMPS IN PARALLEL ###
bagging <- Bagging(fml, data = train, control = Weka_control("num-slots" = 0, I = 1000, W = "DecisionStump", S = 2016, P = 50))
roc(as.factor(test$DEFAULT), predict(bagging, newdata = test, type = "probability")[, 2])
# Area under the curve: 0.6346

### BUILD A BOOSTING CLASSIFIER WITH STUMPS ###
boosting <- AdaBoostM1(fml, data = train, control = Weka_control(I = 100, W = "DecisionStump", S = 2016))
roc(as.factor(test$DEFAULT), predict(boosting, newdata = test, type = "probability")[, 2])
# Area under the curve: 0.6585
 
### DEVELOP A LOGIT MODEL FOR THE BENCHMARK ###
logit <- Logistic(fml, data = train)
roc(as.factor(test$DEFAULT), predict(logit, newdata = test, type = "probability")[, 2])
# Area under the curve: 0.6473