I can calculate the motion of heavenly bodies but not the madness of people. -Isaac Newton

## Decision Stump with the Implementation in SAS

A decision stump is a naively simple but effective rule-based supervised learning algorithm similar to CART (Classification & Regression Tree). However, the stump is a 1-level decision tree consisting of 2 terminal nodes.

Albeit simple, the decision stump has shown successful use cases in many aspects. For instance, as a weak classifier, the decision stump has been proven an excellent base learner in ensemble learning algorithms such as bagging and boosting. Moreover, a single decision stump can also be employed to do feature screening for predictive modeling and cut-point searching for continuous features. The following is an example showing the SAS implementation as well as the predictive power of a decision stump.

First of all, a testing data is simulated with 1 binary response variable Y and 3 continuous features X1 – X3, which X1 is the most related feature to Y with a single cut-point at 5, X2 is also related to Y but with 2 different cut-points at 1.5 and 7.5, and X3 is a pure noise.

The SAS macro below is showing how to program a single decision stump. And this macro would be used to search for the simulated cut-point in each continuous feature.

```%macro stump(data = , w = , y = , xlist = );

%let i = 1;

%local i;

proc sql;
create table _out
(
variable   char(32),
gt_value   num,
gini       num
);
quit;

%do %while (%scan(&xlist, &i) ne %str());
%let x = %scan(&xlist, &i);

data _tmp1(keep = &w &y &x);
set &data;
where &y in (0, 1);
run;

proc sql;
create table
_tmp2 as
select
b.&x                                                          as gt_value,
sum(case when a.&x <= b.&x then &w * &y else 0 end) /
sum(case when a.&x <= b.&x then &w else 0 end)                as p1_1,
sum(case when a.&x >  b.&x then &w * &y else 0 end) /
sum(case when a.&x >  b.&x then &w else 0 end)                as p1_2,
sum(case when a.&x <= b.&x then 1 else 0 end) / count(*)      as ppn1,
sum(case when a.&x >  b.&x then 1 else 0 end) / count(*)      as ppn2,
2 * calculated p1_1 * (1 - calculated p1_1) * calculated ppn1 +
2 * calculated p1_2 * (1 - calculated p1_2) * calculated ppn2 as gini
from
_tmp1 as a,
(select distinct &x from _tmp1) as b
group by
b.&x;

insert into _out
select
"&x",
gt_value,
gini
from
_tmp2
having
gini = min(gini);

drop table _tmp1;
quit;

%let i = %eval(&i + 1);
%end;

proc sort data = _out;
by gini;
run;

proc report data = _out box spacing = 1 split = "*" nowd;
column("DECISION STUMP SUMMARY"
variable gt_value gini);
define variable / "VARIABLE"                     width = 30 center;
define gt_value / "CUTOFF VALUE*(GREATER THAN)"  width = 15 center;
define gini     / "GINI"                         width = 10 center format = 9.4;
run;

%mend stump;
```

As shown in the table below, the decision stump did a fairly good job both in identifying predictive features and in locating cut-points. Both related features, X1 and X2, have been identified and correctly ranked in terms of associations with Y. For X1, the cut-point located is 4.97, extremely close to 5. For X2, the cut-point located is 7.46, close enough to 1 of 2 simulated cut-points.