A decision stump is a naively simple but effective rule-based supervised learning algorithm similar to CART (Classification & Regression Tree). However, the stump is a 1-level decision tree consisting of 2 terminal nodes.
Albeit simple, the decision stump has shown successful use cases in many aspects. For instance, as a weak classifier, the decision stump has been proven an excellent base learner in ensemble learning algorithms such as bagging and boosting. Moreover, a single decision stump can also be employed to do feature screening for predictive modeling and cut-point searching for continuous features. The following is an example showing the SAS implementation as well as the predictive power of a decision stump.
First of all, a testing data is simulated with 1 binary response variable Y and 3 continuous features X1 – X3, which X1 is the most related feature to Y with a single cut-point at 5, X2 is also related to Y but with 2 different cut-points at 1.5 and 7.5, and X3 is a pure noise.
The SAS macro below is showing how to program a single decision stump. And this macro would be used to search for the simulated cut-point in each continuous feature.
%macro stump(data = , w = , y = , xlist = ); %let i = 1; %local i; proc sql; create table _out ( variable char(32), gt_value num, gini num ); quit; %do %while (%scan(&xlist, &i) ne %str()); %let x = %scan(&xlist, &i); data _tmp1(keep = &w &y &x); set &data; where &y in (0, 1); run; proc sql; create table _tmp2 as select b.&x as gt_value, sum(case when a.&x <= b.&x then &w * &y else 0 end) / sum(case when a.&x <= b.&x then &w else 0 end) as p1_1, sum(case when a.&x > b.&x then &w * &y else 0 end) / sum(case when a.&x > b.&x then &w else 0 end) as p1_2, sum(case when a.&x <= b.&x then 1 else 0 end) / count(*) as ppn1, sum(case when a.&x > b.&x then 1 else 0 end) / count(*) as ppn2, 2 * calculated p1_1 * (1 - calculated p1_1) * calculated ppn1 + 2 * calculated p1_2 * (1 - calculated p1_2) * calculated ppn2 as gini from _tmp1 as a, (select distinct &x from _tmp1) as b group by b.&x; insert into _out select "&x", gt_value, gini from _tmp2 having gini = min(gini); drop table _tmp1; quit; %let i = %eval(&i + 1); %end; proc sort data = _out; by gini; run; proc report data = _out box spacing = 1 split = "*" nowd; column("DECISION STUMP SUMMARY" variable gt_value gini); define variable / "VARIABLE" width = 30 center; define gt_value / "CUTOFF VALUE*(GREATER THAN)" width = 15 center; define gini / "GINI" width = 10 center format = 9.4; run; %mend stump;
As shown in the table below, the decision stump did a fairly good job both in identifying predictive features and in locating cut-points. Both related features, X1 and X2, have been identified and correctly ranked in terms of associations with Y. For X1, the cut-point located is 4.97, extremely close to 5. For X2, the cut-point located is 7.46, close enough to 1 of 2 simulated cut-points.