I can calculate the motion of heavenly bodies but not the madness of people. -Isaac Newton

## Bumping: A Stochastic Search for the Best Model

Breiman (1996) showed how to use the bootstrap sampling technique to improve the prediction accuracy in the bagging algorithm, which has shown successful use cases in subset selection and decision trees. However, a major drawback of bagging is that it destroys the simple structure of the original model. For instance, the bagging of decision trees is not presented in a tree structure any more. As a result, bagging improves the model prediction accuracy at the cost of interpretability.

Tibshirani (1997) proposed another use case of bootstrap sampling, which was named Bumping. In the bumping algorithm, bootstrap sampling is used to estimate candidate models with the purpose to do a stochastic search for a best single model throughout the whole model space. As such, the simple structure of the original model, such as the one presented in a decision tree, will be well preserved.

A SAS macro below is showing how to implement the bumping algorithm.

```%macro bumping(data = , y = , numx = , catx = x, ntrees = 100);
***********************************************************;
* THIS SAS MACRO IS AN ATTEMPT TO IMPLEMENT BUMPING       *;
* PROPOSED BY TIBSHIRANI AND KNIGHT (1997)                *;
* ======================================================= *;
* PAMAMETERS:                                             *;
*  DATA   : INPUT SAS DATA TABLE                          *;
*  Y      : RESPONSE VARIABLE WITH 0/1 VALUE              *;
*  NUMX   : A LIST OF NUMERIC ATTRIBUTES                  *;
*  CATX   : A LIST OF CATEGORICAL ATTRIBUTES              *;
*  NTREES : # OF TREES TO DO THE BUMPING SEARCH           *;
* ======================================================= *;
* OUTPUTS:                                                *;
*  BESTTREE.TXT: A TEXT FILE USED TO SCORE THE BEST TREE  *;
*                THROUGH THE BUMPING SEARCH               *;
* ======================================================= *;
* CONTACT:                                                *;
*  WENSUI.LIU@53.COM                                      *;
***********************************************************;

options mprint mlogic nocenter nodate nonumber;

*** a random seed value subject to change ***;
%let seed = 1;

data _null_;
do i = 1 to &ntrees;
random = put(ranuni(&seed) * (10 ** 8), 8.);
name   = compress("random"||put(i, 3.), ' ');
call symput(name, random);
end;
run;

proc datasets library = data nolist;
delete catalog / memtype = catalog;
run;
quit;

proc sql noprint;
select count(*) into :nobs from &data where &y in (1, 0);
quit;

%do i = 1 %to &ntrees;
%put &&random&i;

proc surveyselect data = &data method = urs n = &nobs seed = &&random&i
out = sample&i(rename = (NumberHits = _hits)) noprint;
run;

proc dmdb data = sample&i out = db_sample&i dmdbcat = cl_sample&i;
class &y &catx;
var &numx;
target &y;
freq _hits;
run;

filename out_tree catalog "data.catalog.out_tree.source";

proc split data = db_sample&i dmdbcat = cl_sample&i
criterion    = gini
assess       = impurity
maxbranch    = 2
splitsize    = 100
subtree      = assessment
exhaustive   = 0
nsurrs       = 0;
code file    = out_tree;
input &numx   / level = interval;
input &catx   / level = nominal;
target &y     / level = binary;
freq _hits;
run;

filename in_tree catalog "data.catalog.tree&i..source";

data _null_;
infile out_tree;
input;
file in_tree;
if _n_ > 3 then put _infile_;
run;

data _tmp1(keep = p_&y.1 p_&y.0 &y);
set &data;
%include in_tree;
run;

proc printto new print = lst_out;
run;

ods output kolsmir2stats = _kstmp(where = (label1 = 'KS'));
proc npar1way wilcoxon edf data = _tmp1;
class &y;
var p_&y.1;
run;

proc printto;
run;

%if &i = 1 %then %do;
data _ks;
set _kstmp (keep = nvalue2);
tree_id = &i;
seed    = &&random&i;
ks      = round(nvalue2 * 100, 0.0001);
run;
%end;
%else %do;
data _ks;
set _ks _kstmp(in = a keep = nvalue2);
if a then do;
tree_id = &i;
seed    = &&random&i;
ks      = round(nvalue2 * 100, 0.0001);
end;
run;
%end;
%end;

proc sql noprint;
select max(ks) into :ks from _ks;

select tree_id into :best from _ks where round(ks, 0.0001) = round(&ks., 0.0001);
quit;

filename best catalog "data.catalog.tree%trim(&best).source";
filename output "BestTree.txt";

data _null_;
infile best;
input;
file output;
if _n_ = 1 then do;
put " ******************************************************; ";
put " ***** BEST TREE: TREE %trim(&best) WITH KS = &KS *****; ";
put " ******************************************************; ";
end;
put _infile_;
run;

data _out;
set _ks;

if round(ks, 0.0001) = round(&ks., 0.0001) then flag = '***';
run;

proc print data = _out noobs;
var tree_id seed ks flag;
run;

%mend bumping;

libname data 'D:\SAS_CODE\bagging';

%let x1 = tot_derog tot_tr age_oldest_tr tot_open_tr tot_rev_tr tot_rev_debt
tot_rev_line rev_util bureau_score ltv tot_income;

%let x2 = purpose;

%bumping(data = data.accepts, y = bad, numx = &x1, catx = &x2, ntrees = 50);
```

The table below is to show the result of a stochastic search for the best decision tree out of 50 trees estimated from bootstrapped samples. In the result table, the best tree has been flagged by “***”. With the related seed value, any bootstrap sample and decision trees should be replicated.

```tree_id      seed         ks      flag
1      18496257    41.1210
2      97008872    41.2568
3      39982431    39.2714
4      25939865    38.7901
5      92160258    40.9343
6      96927735    40.6441
7      54297917    41.2460
8      53169172    40.5881
9       4979403    40.9662
10       6656655    41.2006
11      81931857    41.1540
12      52387052    40.1930
13      85339431    36.7912
14       6718458    39.9277
15      95702386    39.0264
16      29719396    39.8790
17      27261179    40.1256
18      68992963    40.7699
19      97676486    37.7472
20      22650752    39.6255
21      68823655    40.3759
22      41276387    41.2282
23      55855411    41.5945    ***
24      28722561    40.6127
25      47578931    40.2973
26      84498698    38.6929
27      63452412    41.0329
28      59036467    39.1822
29      58258153    40.5223
30      37701337    40.2190
31      72836156    40.1872
32      50660353    38.5086
33      93121359    39.9043
34      92912005    40.0265
35      58966034    38.8403
36      29722285    39.7879
37      39104243    38.4006
38      47242918    39.5534
39      67952575    39.2817
40      16808835    40.4024
41      16652610    40.5237
42      87110489    39.9251
43      29878953    39.6106
44      93464176    40.5942
45      90047083    40.4422
46      56878347    40.6057
47       4954566    39.7689
48      13558826    38.7292
49      51131788    41.0891
50      43320456    41.0566
```

At last, the text file used to score the best tree is attached below, in which you should be able to see the structure of the best decision tree selected out of 50 trees.

``` ******************************************************;
***** BEST TREE: TREE 23 WITH KS =  41.5945 *****;
******************************************************;

******         LENGTHS OF NEW CHARACTER VARIABLES         ******;
LENGTH _WARN_  \$    4;

******              LABELS FOR NEW VARIABLES              ******;
LABEL _NODE_  = 'Node' ;
LABEL _LEAF_  = 'Leaf' ;
LABEL _WARN_  = 'Warnings' ;

******      TEMPORARY VARIABLES FOR FORMATTED VALUES      ******;
LENGTH _ARBFMT_2 \$     12; DROP _ARBFMT_2;
_ARBFMT_2 = ' '; /* Initialize to avoid warning. */
LENGTH _ARBFMT_15 \$      5; DROP _ARBFMT_15;
_ARBFMT_15 = ' '; /* Initialize to avoid warning. */

******             ASSIGN OBSERVATION TO NODE             ******;
IF  NOT MISSING(bureau_score ) AND
662.5 <= bureau_score  THEN DO;
IF  NOT MISSING(bureau_score ) AND
721.5 <= bureau_score  THEN DO;
IF  NOT MISSING(tot_derog ) AND
tot_derog  <                  3.5 THEN DO;
IF  NOT MISSING(ltv ) AND
99.5 <= ltv  THEN DO;
IF  NOT MISSING(tot_rev_line ) AND
tot_rev_line  <                16671 THEN DO;
_ARBFMT_15 = PUT( purpose , \$5.);
%DMNORMIP( _ARBFMT_15);
IF _ARBFMT_15 IN ('LEASE' ) THEN DO;
_NODE_  =                   62;
_LEAF_  =                   29;
END;
ELSE DO;
_NODE_  =                   63;
_LEAF_  =                   30;
END;
END;
ELSE DO;
IF  NOT MISSING(ltv ) AND
131.5 <= ltv  THEN DO;
_NODE_  =                   65;
_LEAF_  =                   32;
END;
ELSE DO;
_NODE_  =                   64;
_LEAF_  =                   31;
END;
END;
END;
ELSE DO;
IF  NOT MISSING(tot_open_tr ) AND
tot_open_tr  <                  1.5 THEN DO;
_NODE_  =                   42;
_LEAF_  =                   26;
END;
ELSE DO;
IF  NOT MISSING(tot_derog ) AND
1.5 <= tot_derog  THEN DO;
_NODE_  =                   61;
_LEAF_  =                   28;
END;
ELSE DO;
_NODE_  =                   60;
_LEAF_  =                   27;
END;
END;
END;
END;
ELSE DO;
_NODE_  =                   15;
_LEAF_  =                   33;
END;
END;
ELSE DO;
IF  NOT MISSING(tot_rev_line ) AND
6453.5 <= tot_rev_line  THEN DO;
IF  NOT MISSING(ltv ) AND
122.5 <= ltv  THEN DO;
_NODE_  =                   27;
_LEAF_  =                   25;
END;
ELSE DO;
IF  NOT MISSING(tot_derog ) AND
9.5 <= tot_derog  THEN DO;
_NODE_  =                   41;
_LEAF_  =                   24;
END;
ELSE DO;
IF  NOT MISSING(tot_rev_line ) AND
tot_rev_line  <                16694 THEN DO;
_NODE_  =                   58;
_LEAF_  =                   22;
END;
ELSE DO;
_NODE_  =                   59;
_LEAF_  =                   23;
END;
END;
END;
END;
ELSE DO;
IF  NOT MISSING(ltv ) AND
ltv  <                133.5 THEN DO;
IF  NOT MISSING(tot_income ) AND
tot_income  <               2377.2 THEN DO;
IF  NOT MISSING(age_oldest_tr ) AND
age_oldest_tr  <                   57 THEN DO;
_NODE_  =                   54;
_LEAF_  =                   17;
END;
ELSE DO;
_NODE_  =                   55;
_LEAF_  =                   18;
END;
END;
ELSE DO;
IF  NOT MISSING(ltv ) AND
ltv  <                 94.5 THEN DO;
_NODE_  =                   56;
_LEAF_  =                   19;
END;
ELSE DO;
_NODE_  =                   57;
_LEAF_  =                   20;
END;
END;
END;
ELSE DO;
_NODE_  =                   25;
_LEAF_  =                   21;
END;
END;
END;
END;
ELSE DO;
IF  NOT MISSING(ltv ) AND
ltv  <                 97.5 THEN DO;
IF  NOT MISSING(bureau_score ) AND
bureau_score  <                639.5 THEN DO;
IF  NOT MISSING(tot_open_tr ) AND
3.5 <= tot_open_tr  THEN DO;
IF  NOT MISSING(tot_income ) AND
tot_income  <             2604.165 THEN DO;
_NODE_  =                   32;
_LEAF_  =                    3;
END;
ELSE DO;
IF  NOT MISSING(tot_income ) AND
7375 <= tot_income  THEN DO;
_NODE_  =                   47;
_LEAF_  =                    5;
END;
ELSE DO;
_NODE_  =                   46;
_LEAF_  =                    4;
END;
END;
END;
ELSE DO;
IF  NOT MISSING(tot_rev_line ) AND
tot_rev_line  <                 2460 THEN DO;
_NODE_  =                   30;
_LEAF_  =                    1;
END;
ELSE DO;
_NODE_  =                   31;
_LEAF_  =                    2;
END;
END;
END;
ELSE DO;
IF  NOT MISSING(tot_income ) AND
tot_income  <             9291.835 THEN DO;
IF  NOT MISSING(tot_tr ) AND
13.5 <= tot_tr  THEN DO;
_NODE_  =                   35;
_LEAF_  =                    8;
END;
ELSE DO;
IF  NOT MISSING(bureau_score ) AND
bureau_score  <                646.5 THEN DO;
_NODE_  =                   48;
_LEAF_  =                    6;
END;
ELSE DO;
_NODE_  =                   49;
_LEAF_  =                    7;
END;
END;
END;
ELSE DO;
_NODE_  =                   19;
_LEAF_  =                    9;
END;
END;
END;
ELSE DO;
IF  NOT MISSING(tot_rev_line ) AND
tot_rev_line  <               1218.5 THEN DO;
IF  NOT MISSING(age_oldest_tr ) AND
115 <= age_oldest_tr  THEN DO;
_NODE_  =                   21;
_LEAF_  =                   11;
END;
ELSE DO;
_NODE_  =                   20;
_LEAF_  =                   10;
END;
END;
ELSE DO;
IF  NOT MISSING(bureau_score ) AND
bureau_score  <                  566 THEN DO;
_NODE_  =                   22;
_LEAF_  =                   12;
END;
ELSE DO;
IF  NOT MISSING(tot_rev_line ) AND
13717.5 <= tot_rev_line  THEN DO;
IF  NOT MISSING(tot_rev_debt ) AND
tot_rev_debt  <                11884 THEN DO;
_NODE_  =                   52;
_LEAF_  =                   15;
END;
ELSE DO;
_NODE_  =                   53;
_LEAF_  =                   16;
END;
END;
ELSE DO;
IF  NOT MISSING(ltv ) AND
ltv  <                 99.5 THEN DO;
_NODE_  =                   50;
_LEAF_  =                   13;
END;
ELSE DO;
_NODE_  =                   51;
_LEAF_  =                   14;
END;
END;
END;
END;
END;
END;

****************************************************************;
******          END OF DECISION TREE SCORING CODE         ******;
****************************************************************;
```

Written by statcompute

June 30, 2012 at 1:32 pm

## A SAS Macro Implementing Monotonic WOE Transformation in Scorecard Development

This SAS macro was specifically designed for model developers to do uni-variate variable importance ranking and monotonic weight of evidence (WOE) transformation for potentially hundreds of predictors in the scorecard development. Please feel free to use or distribute it at your own risk. I will really appreciate it if you could share your successful story using this macro in your model development with me.

```%macro num_woe(data = , y = , x = );
***********************************************************;
* THE SAS MACRO IS TO PERFORM UNIVARIATE IMPORTANCE RANK  *;
* ORDER AND MONOTONIC WEIGHT OF EVIDENCE TRANSFORMATION   *;
* FOR NUMERIC ATTRIBUTES IN PRE-MODELING DATA PROCESSING  *;
* (IT IS RECOMMENDED TO RUN THIS MACRO IN THE BATCH MODE) *;
* ======================================================= *;
* PAMAMETERS:                                             *;
*  DATA: INPUT SAS DATA TABLE                             *;
*  Y   : RESPONSE VARIABLE WITH 0/1 VALUE                 *;
*  X   : A LIST OF NUMERIC ATTRIBUTES                     *;
* ======================================================= *;
* OUTPUTS:                                                *;
*  MONO_WOE.WOE: A FILE OF WOE TRANSFORMATION RECODING    *;
*  MONO_WOE.FMT: A FILE OF BINNING FORMAT                 *;
*  MONO_WOE.PUT: A FILE OF PUT STATEMENTS FOR *.FMT FILE  *;
*  MONO_WOE.SUM: A FILE WITH PREDICTABILITY SUMMARY       *;
*  MONO_WOE.OUT: A FILE WITH STATISTICAL DETAILS          *;
*  MONO_WOE.IMP: A FILE OF MISSING IMPUTATION RECODING    *;
* ======================================================= *;
* CONTACT:                                                *;
*  WENSUI.LIU@53.COM                                      *;
***********************************************************;

options nocenter nonumber nodate mprint mlogic symbolgen
orientation = landscape ls = 150;

*** DEFAULT PARAMETERS ***;

%let maxbin = 100;

%let miniv  = 0.03;

%let bignum = 1e300;

***********************************************************;
***         DO NOT CHANGE CODES BELOW THIS LINE         ***;
***********************************************************;

*** DEFAULT OUTPUT FILES ***;

* WOE RECODING FILE                     *;
filename woefile "MONO_WOE.WOE";

* FORMAT FOR BINNING                    *;
filename fmtfile "MONO_WOE.FMT";

* PUT STATEMENT TO USE FORMAT           *;
filename binfile "MONO_WOE.PUT";

* KS SUMMARY                            *;
filename sumfile "MONO_WOE.SUM";

* STATISTICAL SUMMARY FOR EACH VARIABLE *;
filename outfile "MONO_WOE.OUT";

* IMPUTE RECODING FILE                  *;
filename impfile "MONO_WOE.IMP";

*** A MACRO TO DELETE FILE ***;
%macro dfile(file = );
data _null_;
rc = fdelete("&file");
if rc = 0 then do;
put @1 50 * "+";
put "THE EXISTED OUTPUT FILE HAS BEEN DELETED.";
put @1 50 * "+";
end;
run;
%mend dfile;

*** CLEAN UP FILES ***;
%dfile(file = woefile);

%dfile(file = fmtfile);

%dfile(file = binfile);

%dfile(file = sumfile);

%dfile(file = outfile);

%dfile(file = impfile);

*** PARSING THE STRING OF NUMERIC PREDICTORS ***;
ods listing close;
ods output position = _pos1;
proc contents data = &data varnum;
run;

proc sql noprint;
select
upcase(variable) into :x2 separated by ' '
from
_pos1
where
compress(upcase(type), ' ') = 'NUM' and
index("%upcase(%sysfunc(compbl(&x)))", compress(upcase(variable), ' ')) > 0;

select
count(variable) into :xcnt
from
_pos1
where
compress(upcase(type), ' ') = 'NUM' and
index("%upcase(%sysfunc(compbl(&x)))", compress(upcase(variable), ' ')) > 0;
quit;

data _tmp1;
retain &x2 &y;
set &data;
where &Y in (1, 0);
keep &x2 &y;
run;

ods output position = _pos2;
proc contents data = _tmp1 varnum;
run;

*** LOOP THROUGH EACH PREDICTOR ***;
%do i = 1 %to &xcnt;

proc sql noprint;
select
upcase(variable) into :var
from
_pos2
where
num= &i;

select
count(distinct &var) into :xflg
from
_tmp1
where
&var ~= .;
quit;

proc summary data = _tmp1 nway;
output out  = _med(drop = _type_ _freq_)
median(&var) = med nmiss(&var) = mis;
run;

proc sql;
select
med into :median
from
_med;

select
mis into :nmiss
from
_med;

select
case when count(&y) = sum(&y) then 1 else 0 end into :mis_flg1
from
_tmp1
where
&var = .;

select
case when sum(&y) = 0 then 1 else 0 end into :mis_flg2
from
_tmp1
where
&var = .;
quit;

%let nbin = %sysfunc(min(&maxbin, &xflg));

*** CHECK IF THE NUMBER OF DISTINCT VALUES > 1 ***;
%if &xflg > 1 %then %do;

*** IMPUTE MISS VALUE WHEN WOE CANNOT BE CALCULATED ***;
%if &mis_flg1 = 1 | &mis_flg2 = 1 %then %do;
data _null_;
file impfile mod;
put " ";
put @3 "*** MEDIAN IMPUTATION OF %TRIM(%UPCASE(&VAR)) (NMISS = %trim(&nmiss)) ***;";
put @3 "IF %TRIM(%UPCASE(&VAR)) = . THEN %TRIM(%UPCASE(&VAR)) = &MEDIAN;";
run;

data _tmp1;
set _tmp1;
if &var = . then &var = &median;
run;
%end;

*** LOOP THROUGH THE NUMBER OF BINS ***;
%do j = &nbin %to 2 %by -1;
proc rank data = _tmp1 groups = &j out = _tmp2(keep = &y &var rank);
var &var;
ranks rank;
run;

proc summary data = _tmp2 nway missing;
class rank;
output out = _tmp3(drop = _type_ rename = (_freq_ = freq))
min(&var) = minx   max(&var) = maxx;
run;

*** CREATE FLAGS FOR MULTIPLE CRITERION ***;
proc sql noprint;
select
from
_tmp3;

select
case when min(bad_rate) > 0 then 1 else 0 end into :minflg
from
_tmp3;

select
case when max(bad_rate) < 1 then 1 else 0 end into :maxflg
from
_tmp3;
quit;

*** CHECK IF SPEARMAN CORRELATION = 1 ***;
%if &badflg = 1 & &minflg = 1 & &maxflg = 1 %then %do;
ods output spearmancorr = _corr(rename = (minx = cor));
proc corr data = _tmp3 spearman;
var minx;
run;

proc sql noprint;
select
case when abs(cor) = 1 then 1 else 0 end into :cor
from
_corr;
quit;

*** IF SPEARMAN CORR = 1 THEN BREAK THE LOOP ***;
%if &cor = 1 %then %goto loopout;
%end;
%else %if &nbin = 2 %then %goto exit;
%end;

%loopout:

*** CALCULATE STATISTICAL SUMMARY ***;
proc sql noprint;
select
sum(freq) into :freq
from
_tmp3;

select
from
_tmp3;
quit;

proc sort data = _tmp3 sortsize = max;
by rank;
run;

data _tmp4;
set _tmp3 end = eof;
by rank;

if rank = . then bin = 0;
else do;
retain b 0;
bin + 1;
end;

pct  = freq / &freq;
woe  = log(bpct / gpct);
iv   = (bpct - gpct) * woe;

retain cum_bpct cum_gpct;
cum_bpct + bpct;
cum_gpct + gpct;
ks = abs(cum_gpct - cum_bpct) * 100;

retain iv_sum ks_max;
iv_sum + iv;
ks_max = max(ks_max, ks);
if eof then do;
call symput("bin", put(bin, 4.));
call symput("ks", put(ks_max, 10.4));
call symput("iv", put(iv_sum, 10.4));
end;

gpct bpct woe iv cum_gpct cum_bpct ks;
run;

*** REPORT STATISTICAL SUMMARY ***;
proc printto print = outfile;
run;

title;
ods listing;
proc report data = _tmp4 spacing = 1 split = "*" headline nowindows;
column(" * MONOTONIC WEIGHT OF EVIDENCE TRANSFORMATION FOR %upcase(%trim(&var))"

define bin      /"BIN*LEVEL"   width = 5  format = z3. order order = data;
define minx     /"LOWER*LIMIT" width = 15 format = 14.4;
define maxx     /"UPPER*LIMIT" width = 15 format = 14.4;
define freq     /"#FREQ"       width = 10 format = 9.;
define pct      /"PERCENT"     width = 8  format = percent8.2;
define woe      /"WOE"         width = 10 format = 9.4;
define iv       /"INFO.*VALUE" width = 10 format = 9.4;
define ks       /"KS"          width = 10 format = 9.4;
compute after;
line @1 110 * "-";
"MAX. KS = %trim(&ks), INFO. VALUE = %trim(&iv).";
line @1 110 * "-";
endcomp;
run;
ods listing close;

proc printto;
run;

proc sql noprint;
select
case when sum(iv) >= &miniv then 1 else 0 end into :ivflg
from
_tmp4;
quit;

*** OUTPUT RECODING FILES IF IV >= &miniv BY DEFAULT ***;
%if &ivflg = 1 %then %do;
data _tmp5;
length upper \$20 lower \$20;
lower = compress(put(maxx, 20.4), ' ');

set _tmp4 end = eof;
upper = compress(put(maxx, 20.4), ' ');
if bin = 1 then lower = "-%trim(&bignum)";
if eof then upper = "%trim(&bignum)";
w%trim(&var) = compress(put(woe, 12.8), ' ');
run;

*** OUTPUT WOE RECODE FILE ***;
data _null_;
set _tmp5 end = eof;
file woefile mod;

if bin = 0 and _n_ = 1 then do;
put " ";
put @3 3 * "*"
" WOE RECODE OF %upcase(%trim(&var)) (KS = %trim(&ks), IV = %trim(&iv))"
+ 1 3 * "*" ";";
put @3  "if %trim(&var) = . then w%trim(&var) = " + 1 w%trim(&var) ";";
end;
if bin = 1 and _n_ = 1 then do;
put " ";
put @3 3 * "*"
" WOE RECODE OF %upcase(%trim(&var)) (KS = %trim(&ks), IV = %trim(&iv))"
+ 1 3 * "*" ";";
put @3 "if " + 1 lower " < %trim(&var) <= " upper
" then w%trim(&var) = " + 1 w%trim(&var) ";";
end;
if _n_ > 1 then do;
put @5 "else if " + 1 lower " < %trim(&var) <= " upper
" then w%trim(&var) = " + 1 w%trim(&var) ";";
end;
if eof then do;
put @5 "else w%trim(&var) = 0;";
end;
run;

*** OUTPUT BINNING FORMAT FILE ***;
data _null_;
set _tmp5 end = eof;
file fmtfile mod;

if bin = 1 then lower = "LOW";
if eof then upper = "HIGH";

if bin = 0 and _n_ = 1 then do;
put " ";
put @3 3 * "*"
" BINNING FORMAT OF %trim(&var) (KS = %trim(&ks), IV = %trim(&IV))"
+ 1 3 * "*" ";";
put @3 "value %trim(&var)_fmt";
put @5 ". " @40 " = '" bin: z3.
". MISSINGS'";
end;

if bin = 1 and _n_ = 1 then do;
put " ";
put @3 3 * "*"
@5 "BINNING FORMAT OF %trim(&var) (KS = %trim(&ks), IV = %trim(&IV))"
+ 1 3 * "*" ";";
put @3 "value %trim(&var)_fmt";
put @5 lower @15 " - " + 1 upper  @40 " = '" bin: z3.
". " + 1 lower " - " + 1 upper "'";
end;

if _n_ > 1 then do;
put @5 lower @15 "<- " + 1 upper @40 " = '" bin: z3.
". " + 1 lower "<- " + 1 upper "'";
end;
if eof then do;
put @5 "OTHER" @40 " = '999 .  OTHERS';";
end;
run;

*** OUTPUT BINNING RECODE FILE ***;
data _null_;
file binfile mod;
put " ";
put @3 "*** BINNING RECODE of %trim(&var) ***;";
put @3 "c%trim(&var) = put(%trim(&var), %trim(&var)_fmt.);";
run;

*** SAVE SUMMARY OF EACH VARIABLE INTO A TABLE ***;
%if %sysfunc(exist(work._result)) %then %do;
data _result;
format variable \$32. bin 3. ks 10.4 iv 10.4;
if _n_ = 1 then do;
variable = "%trim(&var)";
bin      = &bin;
ks       = &ks;
iv       = &iv;
output;
end;
set _result;
output;
run;
%end;
%else %do;
data _result;
format variable \$32. bin 3. ks 10.4 iv 10.4;
variable = "%trim(&var)";
bin      = &bin;
ks       = &ks;
iv       = &iv;
run;
%end;
%end;

%exit:

*** CLEAN UP TEMPORARY TABLES ***;
proc datasets library = work nolist;
delete _tmp2 - _tmp5 _corr / memtype = data;
run;
quit;
%end;
%end;

*** SORT VARIABLES BY KS AND OUTPUT RESULTS ***;
proc sort data = _result sortsize = max;
by descending ks descending iv;
run;

data _null_;
set _result end = eof;
file sumfile;

if _n_ = 1 then do;
put @1 80 * "-";
put @1  "| RANK" @10 "| VARIABLE RANKED BY KS" @45 "| # BINS"
@55 "|  KS"  @66 "| INFO. VALUE" @80 "|";
put @1 80 * "-";
end;
put @1  "| " @4  _n_ z3. @10 "| " @12 variable @45 "| " @50 bin
@55 "| " @57 ks      @66 "| " @69 iv       @80 "|";
if eof then do;
put @1 80 * "-";
end;
run;

proc datasets library = work nolist;
delete _result (mt = data);
run;
quit;

*********************************************************;
*           END OF NUM_WOE MACRO                        *;
*********************************************************;
%mend num_woe;

libname data 'D:\SAS_CODE\woe';

%let x =
tot_derog
tot_tr
age_oldest_tr
tot_open_tr
tot_rev_tr
tot_rev_debt
tot_rev_line
rev_util
bureau_score
ltv
tot_income
;

%num_woe(data = data.accepts, y = bad, x = &x);
```

The macro above will automatically generate 6 standard output files with different contents for various purposes through the whole process of scorecard development.

1) “MONO_WOE.WOE” is a file of WOE transformation recoding.

```
*** WOE RECODE OF TOT_DEROG (KS = 20.0442, IV = 0.2480) ***;
if TOT_DEROG = . then wTOT_DEROG =  0.64159782 ;
else if  -1e300  < TOT_DEROG <= 0.0000  then wTOT_DEROG =  -0.55591373 ;
else if  0.0000  < TOT_DEROG <= 2.0000  then wTOT_DEROG =  0.14404414 ;
else if  2.0000  < TOT_DEROG <= 4.0000  then wTOT_DEROG =  0.50783799 ;
else if  4.0000  < TOT_DEROG <= 1e300  then wTOT_DEROG =  0.64256014 ;
else wTOT_DEROG = 0;

*** WOE RECODE OF TOT_TR (KS = 16.8344, IV = 0.1307) ***;
if TOT_TR = . then wTOT_TR =  0.64159782 ;
else if  -1e300  < TOT_TR <= 7.0000  then wTOT_TR =  0.40925900 ;
else if  7.0000  < TOT_TR <= 12.0000  then wTOT_TR =  0.26386662 ;
else if  12.0000  < TOT_TR <= 18.0000  then wTOT_TR =  -0.13512611 ;
else if  18.0000  < TOT_TR <= 25.0000  then wTOT_TR =  -0.40608173 ;
else if  25.0000  < TOT_TR <= 1e300  then wTOT_TR =  -0.42369090 ;
else wTOT_TR = 0;

*** WOE RECODE OF AGE_OLDEST_TR (KS = 19.6163, IV = 0.2495) ***;
if AGE_OLDEST_TR = . then wAGE_OLDEST_TR =  0.66280002 ;
else if  -1e300  < AGE_OLDEST_TR <= 46.0000  then wAGE_OLDEST_TR =  0.66914925 ;
else if  46.0000  < AGE_OLDEST_TR <= 77.0000  then wAGE_OLDEST_TR =  0.36328349 ;
else if  77.0000  < AGE_OLDEST_TR <= 114.0000  then wAGE_OLDEST_TR =  0.15812827 ;
else if  114.0000  < AGE_OLDEST_TR <= 137.0000  then wAGE_OLDEST_TR =  0.01844301 ;
else if  137.0000  < AGE_OLDEST_TR <= 164.0000  then wAGE_OLDEST_TR =  -0.04100445 ;
else if  164.0000  < AGE_OLDEST_TR <= 204.0000  then wAGE_OLDEST_TR =  -0.32667232 ;
else if  204.0000  < AGE_OLDEST_TR <= 275.0000  then wAGE_OLDEST_TR =  -0.79931317 ;
else if  275.0000  < AGE_OLDEST_TR <= 1e300  then wAGE_OLDEST_TR =  -0.89926463 ;
else wAGE_OLDEST_TR = 0;

*** WOE RECODE OF TOT_REV_TR (KS = 9.0779, IV = 0.0757) ***;
if TOT_REV_TR = . then wTOT_REV_TR =  0.69097090 ;
else if  -1e300  < TOT_REV_TR <= 1.0000  then wTOT_REV_TR =  0.00269270 ;
else if  1.0000  < TOT_REV_TR <= 3.0000  then wTOT_REV_TR =  -0.14477602 ;
else if  3.0000  < TOT_REV_TR <= 1e300  then wTOT_REV_TR =  -0.15200275 ;
else wTOT_REV_TR = 0;

*** WOE RECODE OF TOT_REV_DEBT (KS = 8.5317, IV = 0.0629) ***;
if TOT_REV_DEBT = . then wTOT_REV_DEBT =  0.68160936 ;
else if  -1e300  < TOT_REV_DEBT <= 3009.0000  then wTOT_REV_DEBT =  0.04044249 ;
else if  3009.0000  < TOT_REV_DEBT <= 1e300  then wTOT_REV_DEBT =  -0.19723686 ;
else wTOT_REV_DEBT = 0;

*** WOE RECODE OF TOT_REV_LINE (KS = 25.5174, IV = 0.3970) ***;
if TOT_REV_LINE = . then wTOT_REV_LINE =  0.68160936 ;
else if  -1e300  < TOT_REV_LINE <= 1477.0000  then wTOT_REV_LINE =  0.73834416 ;
else if  1477.0000  < TOT_REV_LINE <= 4042.0000  then wTOT_REV_LINE =  0.34923628 ;
else if  4042.0000  < TOT_REV_LINE <= 8350.0000  then wTOT_REV_LINE =  0.11656236 ;
else if  8350.0000  < TOT_REV_LINE <= 14095.0000  then wTOT_REV_LINE =  0.03996934 ;
else if  14095.0000  < TOT_REV_LINE <= 23419.0000  then wTOT_REV_LINE =  -0.49492745 ;
else if  23419.0000  < TOT_REV_LINE <= 38259.0000  then wTOT_REV_LINE =  -0.94090721 ;
else if  38259.0000  < TOT_REV_LINE <= 1e300  then wTOT_REV_LINE =  -1.22174118 ;
else wTOT_REV_LINE = 0;

*** WOE RECODE OF REV_UTIL (KS = 14.3262, IV = 0.0834) ***;
if  -1e300  < REV_UTIL <= 29.0000  then wREV_UTIL =  -0.31721190 ;
else if  29.0000  < REV_UTIL <= 1e300  then wREV_UTIL =  0.26459777 ;
else wREV_UTIL = 0;

*** WOE RECODE OF BUREAU_SCORE (KS = 34.1481, IV = 0.7251) ***;
if BUREAU_SCORE = . then wBUREAU_SCORE =  0.66280002 ;
else if  -1e300  < BUREAU_SCORE <= 653.0000  then wBUREAU_SCORE =  0.93490359 ;
else if  653.0000  < BUREAU_SCORE <= 692.0000  then wBUREAU_SCORE =  0.07762676 ;
else if  692.0000  < BUREAU_SCORE <= 735.0000  then wBUREAU_SCORE =  -0.58254635 ;
else if  735.0000  < BUREAU_SCORE <= 1e300  then wBUREAU_SCORE =  -1.61790566 ;
else wBUREAU_SCORE = 0;

*** WOE RECODE OF LTV (KS = 16.3484, IV = 0.1625) ***;
if  -1e300  < LTV <= 82.0000  then wLTV =  -0.84674934 ;
else if  82.0000  < LTV <= 91.0000  then wLTV =  -0.43163689 ;
else if  91.0000  < LTV <= 97.0000  then wLTV =  -0.14361551 ;
else if  97.0000  < LTV <= 101.0000  then wLTV =  0.08606320 ;
else if  101.0000  < LTV <= 107.0000  then wLTV =  0.18554122 ;
else if  107.0000  < LTV <= 115.0000  then wLTV =  0.22405397 ;
else if  115.0000  < LTV <= 1e300  then wLTV =  0.51906325 ;
else wLTV = 0;
```

2) “MONO_WOE.FMT” is a file for binning format.

```
*** BINNING FORMAT OF TOT_DEROG (KS = 20.0442, IV = 0.2480) ***;
value TOT_DEROG_fmt
.                                   = '000 . MISSINGS'
LOW       <-  0.0000                = '001 .  LOW <-  0.0000 '
0.0000    <-  2.0000                = '002 .  0.0000 <-  2.0000 '
2.0000    <-  4.0000                = '003 .  2.0000 <-  4.0000 '
4.0000    <-  HIGH                  = '004 .  4.0000 <-  HIGH '
OTHER                               = '999 .  OTHERS';

*** BINNING FORMAT OF TOT_TR (KS = 16.8344, IV = 0.1307) ***;
value TOT_TR_fmt
.                                   = '000 . MISSINGS'
LOW       <-  7.0000                = '001 .  LOW <-  7.0000 '
7.0000    <-  12.0000               = '002 .  7.0000 <-  12.0000 '
12.0000   <-  18.0000               = '003 .  12.0000 <-  18.0000 '
18.0000   <-  25.0000               = '004 .  18.0000 <-  25.0000 '
25.0000   <-  HIGH                  = '005 .  25.0000 <-  HIGH '
OTHER                               = '999 .  OTHERS';

*** BINNING FORMAT OF AGE_OLDEST_TR (KS = 19.6163, IV = 0.2495) ***;
value AGE_OLDEST_TR_fmt
.                                   = '000 . MISSINGS'
LOW       <-  46.0000               = '001 .  LOW <-  46.0000 '
46.0000   <-  77.0000               = '002 .  46.0000 <-  77.0000 '
77.0000   <-  114.0000              = '003 .  77.0000 <-  114.0000 '
114.0000  <-  137.0000              = '004 .  114.0000 <-  137.0000 '
137.0000  <-  164.0000              = '005 .  137.0000 <-  164.0000 '
164.0000  <-  204.0000              = '006 .  164.0000 <-  204.0000 '
204.0000  <-  275.0000              = '007 .  204.0000 <-  275.0000 '
275.0000  <-  HIGH                  = '008 .  275.0000 <-  HIGH '
OTHER                               = '999 .  OTHERS';

*** BINNING FORMAT OF TOT_REV_TR (KS = 9.0779, IV = 0.0757) ***;
value TOT_REV_TR_fmt
.                                   = '000 . MISSINGS'
LOW       <-  1.0000                = '001 .  LOW <-  1.0000 '
1.0000    <-  3.0000                = '002 .  1.0000 <-  3.0000 '
3.0000    <-  HIGH                  = '003 .  3.0000 <-  HIGH '
OTHER                               = '999 .  OTHERS';

*** BINNING FORMAT OF TOT_REV_DEBT (KS = 8.5317, IV = 0.0629) ***;
value TOT_REV_DEBT_fmt
.                                   = '000 . MISSINGS'
LOW       <-  3009.0000             = '001 .  LOW <-  3009.0000 '
3009.0000 <-  HIGH                  = '002 .  3009.0000 <-  HIGH '
OTHER                               = '999 .  OTHERS';

*** BINNING FORMAT OF TOT_REV_LINE (KS = 25.5174, IV = 0.3970) ***;
value TOT_REV_LINE_fmt
.                                   = '000 . MISSINGS'
LOW       <-  1477.0000             = '001 .  LOW <-  1477.0000 '
1477.0000 <-  4042.0000             = '002 .  1477.0000 <-  4042.0000 '
4042.0000 <-  8350.0000             = '003 .  4042.0000 <-  8350.0000 '
8350.0000 <-  14095.0000            = '004 .  8350.0000 <-  14095.0000 '
14095.0000<-  23419.0000            = '005 .  14095.0000 <-  23419.0000 '
23419.0000<-  38259.0000            = '006 .  23419.0000 <-  38259.0000 '
38259.0000<-  HIGH                  = '007 .  38259.0000 <-  HIGH '
OTHER                               = '999 .  OTHERS';

**BINNING FORMAT OF REV_UTIL (KS = 14.3262, IV = 0.0834) ***;
value REV_UTIL_fmt
LOW        -  29.0000               = '001 .  LOW  -  29.0000 '
29.0000   <-  HIGH                  = '002 .  29.0000 <-  HIGH '
OTHER                               = '999 .  OTHERS';

*** BINNING FORMAT OF BUREAU_SCORE (KS = 34.1481, IV = 0.7251) ***;
value BUREAU_SCORE_fmt
.                                   = '000 . MISSINGS'
LOW       <-  653.0000              = '001 .  LOW <-  653.0000 '
653.0000  <-  692.0000              = '002 .  653.0000 <-  692.0000 '
692.0000  <-  735.0000              = '003 .  692.0000 <-  735.0000 '
735.0000  <-  HIGH                  = '004 .  735.0000 <-  HIGH '
OTHER                               = '999 .  OTHERS';

**BINNING FORMAT OF LTV (KS = 16.3484, IV = 0.1625) ***;
value LTV_fmt
LOW        -  82.0000               = '001 .  LOW  -  82.0000 '
82.0000   <-  91.0000               = '002 .  82.0000 <-  91.0000 '
91.0000   <-  97.0000               = '003 .  91.0000 <-  97.0000 '
97.0000   <-  101.0000              = '004 .  97.0000 <-  101.0000 '
101.0000  <-  107.0000              = '005 .  101.0000 <-  107.0000 '
107.0000  <-  115.0000              = '006 .  107.0000 <-  115.0000 '
115.0000  <-  HIGH                  = '007 .  115.0000 <-  HIGH '
OTHER                               = '999 .  OTHERS';
```

3) “MONO_WOE.PUT” is a file for “put” statements with the above *.FMT file.

```  *** BINNING RECODE of TOT_DEROG ***;
cTOT_DEROG = put(TOT_DEROG, TOT_DEROG_fmt.);

*** BINNING RECODE of TOT_TR ***;
cTOT_TR = put(TOT_TR, TOT_TR_fmt.);

*** BINNING RECODE of AGE_OLDEST_TR ***;
cAGE_OLDEST_TR = put(AGE_OLDEST_TR, AGE_OLDEST_TR_fmt.);

*** BINNING RECODE of TOT_REV_TR ***;
cTOT_REV_TR = put(TOT_REV_TR, TOT_REV_TR_fmt.);

*** BINNING RECODE of TOT_REV_DEBT ***;
cTOT_REV_DEBT = put(TOT_REV_DEBT, TOT_REV_DEBT_fmt.);

*** BINNING RECODE of TOT_REV_LINE ***;
cTOT_REV_LINE = put(TOT_REV_LINE, TOT_REV_LINE_fmt.);

*** BINNING RECODE of REV_UTIL ***;
cREV_UTIL = put(REV_UTIL, REV_UTIL_fmt.);

*** BINNING RECODE of BUREAU_SCORE ***;
cBUREAU_SCORE = put(BUREAU_SCORE, BUREAU_SCORE_fmt.);

*** BINNING RECODE of LTV ***;
cLTV = put(LTV, LTV_fmt.);
```

4) “MONO_WOE.SUM” is a file summarizing the predictability of all numeric variables, e.g. KS statistics and Information Values.

```--------------------------------------------------------------------------------
| RANK   | VARIABLE RANKED BY KS            | # BINS  |  KS      | INFO. VALUE |
--------------------------------------------------------------------------------
|  001   | BUREAU_SCORE                     |    4    | 34.1481  |  0.7251     |
|  002   | TOT_REV_LINE                     |    7    | 25.5174  |  0.3970     |
|  003   | TOT_DEROG                        |    4    | 20.0442  |  0.2480     |
|  004   | AGE_OLDEST_TR                    |    8    | 19.6163  |  0.2495     |
|  005   | TOT_TR                           |    5    | 16.8344  |  0.1307     |
|  006   | LTV                              |    7    | 16.3484  |  0.1625     |
|  007   | REV_UTIL                         |    2    | 14.3262  |  0.0834     |
|  008   | TOT_REV_TR                       |    3    | 9.0779   |  0.0757     |
|  009   | TOT_REV_DEBT                     |    2    | 8.5317   |  0.0629     |
--------------------------------------------------------------------------------
```

5) “MONO_WOE.OUT” is a file providing statistical summaries of all numeric variables.

```                          MONOTONIC WEIGHT OF EVIDENCE TRANSFORMATION FOR TOT_DEROG
LEVEL           LIMIT           LIMIT      #FREQ  PERCENT    (Y=1)     RATE        WOE      VALUE         KS
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
000           .               .            213   3.65%        70  32.86%      0.6416     0.0178     2.7716
001          0.0000          0.0000       2850  48.83%       367  12.88%     -0.5559     0.1268    20.0442
002          1.0000          2.0000       1369  23.45%       314  22.94%      0.1440     0.0051    16.5222
003          3.0000          4.0000        587  10.06%       176  29.98%      0.5078     0.0298    10.6623
004          5.0000         32.0000        818  14.01%       269  32.89%      0.6426     0.0685     0.0000
--------------------------------------------------------------------------------------------------------------
# TOTAL = 5837, # BADs(Y=1) = 1196, OVERALL BAD RATE = 20.49%, MAX. KS = 20.0442, INFO. VALUE = 0.2480.
--------------------------------------------------------------------------------------------------------------

MONOTONIC WEIGHT OF EVIDENCE TRANSFORMATION FOR TOT_TR
LEVEL           LIMIT           LIMIT      #FREQ  PERCENT    (Y=1)     RATE        WOE      VALUE         KS
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
000           .               .            213   3.65%        70  32.86%      0.6416     0.0178     2.7716
001          0.0000          7.0000       1159  19.86%       324  27.96%      0.4093     0.0372    11.8701
002          8.0000         12.0000       1019  17.46%       256  25.12%      0.2639     0.0131    16.8344
003         13.0000         18.0000       1170  20.04%       215  18.38%     -0.1351     0.0035    14.2335
004         19.0000         25.0000       1126  19.29%       165  14.65%     -0.4061     0.0281     7.3227
005         26.0000         77.0000       1150  19.70%       166  14.43%     -0.4237     0.0310     0.0000
--------------------------------------------------------------------------------------------------------------
# TOTAL = 5837, # BADs(Y=1) = 1196, OVERALL BAD RATE = 20.49%, MAX. KS = 16.8344, INFO. VALUE = 0.1307.
--------------------------------------------------------------------------------------------------------------

MONOTONIC WEIGHT OF EVIDENCE TRANSFORMATION FOR AGE_OLDEST_TR
LEVEL           LIMIT           LIMIT      #FREQ  PERCENT    (Y=1)     RATE        WOE      VALUE         KS
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
000           .               .            216   3.70%        72  33.33%      0.6628     0.0193     2.9173
001          1.0000         46.0000        708  12.13%       237  33.47%      0.6691     0.0647    12.5847
002         47.0000         77.0000        699  11.98%       189  27.04%      0.3633     0.0175    17.3983
003         78.0000        114.0000        703  12.04%       163  23.19%      0.1581     0.0032    19.3917
004        115.0000        137.0000        707  12.11%       147  20.79%      0.0184     0.0000    19.6163
005        138.0000        164.0000        706  12.10%       140  19.83%     -0.0410     0.0002    19.1263
006        165.0000        204.0000        689  11.80%       108  15.67%     -0.3267     0.0114    15.6376
007        205.0000        275.0000        703  12.04%        73  10.38%     -0.7993     0.0597     8.1666
008        276.0000        588.0000        706  12.10%        67   9.49%     -0.8993     0.0734     0.0000
--------------------------------------------------------------------------------------------------------------
# TOTAL = 5837, # BADs(Y=1) = 1196, OVERALL BAD RATE = 20.49%, MAX. KS = 19.6163, INFO. VALUE = 0.2495.
--------------------------------------------------------------------------------------------------------------

MONOTONIC WEIGHT OF EVIDENCE TRANSFORMATION FOR TOT_OPEN_TR
LEVEL           LIMIT           LIMIT      #FREQ  PERCENT    (Y=1)     RATE        WOE      VALUE         KS
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
000           .               .           1416  24.26%       354  25.00%      0.2573     0.0173     6.7157
001          0.0000          4.0000       1815  31.09%       353  19.45%     -0.0651     0.0013     4.7289
002          5.0000          6.0000       1179  20.20%       226  19.17%     -0.0831     0.0014     3.0908
003          7.0000         26.0000       1427  24.45%       263  18.43%     -0.1315     0.0041     0.0000
--------------------------------------------------------------------------------------------------------------
# TOTAL = 5837, # BADs(Y=1) = 1196, OVERALL BAD RATE = 20.49%, MAX. KS = 6.7157, INFO. VALUE = 0.0240.
--------------------------------------------------------------------------------------------------------------

MONOTONIC WEIGHT OF EVIDENCE TRANSFORMATION FOR TOT_REV_TR
LEVEL           LIMIT           LIMIT      #FREQ  PERCENT    (Y=1)     RATE        WOE      VALUE         KS
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
000           .               .            636  10.90%       216  33.96%      0.6910     0.0623     9.0104
001          0.0000          1.0000       1461  25.03%       300  20.53%      0.0027     0.0000     9.0779
002          2.0000          3.0000       2002  34.30%       365  18.23%     -0.1448     0.0069     4.3237
003          4.0000         24.0000       1738  29.78%       315  18.12%     -0.1520     0.0066     0.0000
--------------------------------------------------------------------------------------------------------------
# TOTAL = 5837, # BADs(Y=1) = 1196, OVERALL BAD RATE = 20.49%, MAX. KS = 9.0779, INFO. VALUE = 0.0757.
--------------------------------------------------------------------------------------------------------------

MONOTONIC WEIGHT OF EVIDENCE TRANSFORMATION FOR TOT_REV_DEBT
LEVEL           LIMIT           LIMIT      #FREQ  PERCENT    (Y=1)     RATE        WOE      VALUE         KS
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
000           .               .            477   8.17%       161  33.75%      0.6816     0.0453     6.6527
001          0.0000       3009.0000       2680  45.91%       567  21.16%      0.0404     0.0008     8.5317
002       3010.0000      96260.0000       2680  45.91%       468  17.46%     -0.1972     0.0168     0.0000
--------------------------------------------------------------------------------------------------------------
# TOTAL = 5837, # BADs(Y=1) = 1196, OVERALL BAD RATE = 20.49%, MAX. KS = 8.5317, INFO. VALUE = 0.0629.
--------------------------------------------------------------------------------------------------------------

MONOTONIC WEIGHT OF EVIDENCE TRANSFORMATION FOR TOT_REV_LINE
LEVEL           LIMIT           LIMIT      #FREQ  PERCENT    (Y=1)     RATE        WOE      VALUE         KS
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
000           .               .            477   8.17%       161  33.75%      0.6816     0.0453     6.6527
001          0.0000       1477.0000        765  13.11%       268  35.03%      0.7383     0.0864    18.3518
002       1481.0000       4042.0000        766  13.12%       205  26.76%      0.3492     0.0176    23.4043
003       4044.0000       8350.0000        766  13.12%       172  22.45%      0.1166     0.0018    24.9867
004       8360.0000      14095.0000        766  13.12%       162  21.15%      0.0400     0.0002    25.5174
005      14100.0000      23419.0000        766  13.12%       104  13.58%     -0.4949     0.0276    19.9488
006      23427.0000      38259.0000        766  13.12%        70   9.14%     -0.9409     0.0860    10.8049
007      38300.0000     205395.0000        765  13.11%        54   7.06%     -1.2217     0.1320     0.0000
--------------------------------------------------------------------------------------------------------------
# TOTAL = 5837, # BADs(Y=1) = 1196, OVERALL BAD RATE = 20.49%, MAX. KS = 25.5174, INFO. VALUE = 0.3970.
--------------------------------------------------------------------------------------------------------------

MONOTONIC WEIGHT OF EVIDENCE TRANSFORMATION FOR REV_UTIL
LEVEL           LIMIT           LIMIT      #FREQ  PERCENT    (Y=1)     RATE        WOE      VALUE         KS
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
001          0.0000         29.0000       2905  49.77%       459  15.80%     -0.3172     0.0454    14.3262
002         30.0000        100.0000       2932  50.23%       737  25.14%      0.2646     0.0379     0.0000
--------------------------------------------------------------------------------------------------------------
# TOTAL = 5837, # BADs(Y=1) = 1196, OVERALL BAD RATE = 20.49%, MAX. KS = 14.3262, INFO. VALUE = 0.0834.
--------------------------------------------------------------------------------------------------------------

MONOTONIC WEIGHT OF EVIDENCE TRANSFORMATION FOR BUREAU_SCORE
LEVEL           LIMIT           LIMIT      #FREQ  PERCENT    (Y=1)     RATE        WOE      VALUE         KS
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
000           .               .            315   5.40%       105  33.33%      0.6628     0.0282     4.2544
001        443.0000        653.0000       1393  23.86%       552  39.63%      0.9349     0.2621    32.2871
002        654.0000        692.0000       1368  23.44%       298  21.78%      0.0776     0.0014    34.1481
003        693.0000        735.0000       1383  23.69%       174  12.58%     -0.5825     0.0670    22.6462
004        736.0000        848.0000       1378  23.61%        67   4.86%     -1.6179     0.3664     0.0000
--------------------------------------------------------------------------------------------------------------
# TOTAL = 5837, # BADs(Y=1) = 1196, OVERALL BAD RATE = 20.49%, MAX. KS = 34.1481, INFO. VALUE = 0.7251.
--------------------------------------------------------------------------------------------------------------

MONOTONIC WEIGHT OF EVIDENCE TRANSFORMATION FOR LTV
LEVEL           LIMIT           LIMIT      #FREQ  PERCENT    (Y=1)     RATE        WOE      VALUE         KS
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
001          0.0000         82.0000        814  13.95%        81   9.95%     -0.8467     0.0764     9.0214
002         83.0000         91.0000        837  14.34%       120  14.34%     -0.4316     0.0234    14.4372
003         92.0000         97.0000        811  13.89%       148  18.25%     -0.1436     0.0027    16.3484
004         98.0000        101.0000        830  14.22%       182  21.93%      0.0861     0.0011    15.0935
005        102.0000        107.0000        870  14.90%       206  23.68%      0.1855     0.0054    12.1767
006        108.0000        115.0000        808  13.84%       197  24.38%      0.2241     0.0074     8.8704
007        116.0000        176.0000        867  14.85%       262  30.22%      0.5191     0.0460     0.0000
--------------------------------------------------------------------------------------------------------------
# TOTAL = 5837, # BADs(Y=1) = 1196, OVERALL BAD RATE = 20.49%, MAX. KS = 16.3484, INFO. VALUE = 0.1625.
--------------------------------------------------------------------------------------------------------------

MONOTONIC WEIGHT OF EVIDENCE TRANSFORMATION FOR TOT_INCOME
LEVEL           LIMIT           LIMIT      #FREQ  PERCENT    (Y=1)     RATE        WOE      VALUE         KS
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
000           .               .              5   0.09%         1  20.00%     -0.0303     0.0000     0.0026
001          0.0000       3397.0000       2913  49.91%       669  22.97%      0.1457     0.0111     7.5822
002       3400.0000    8147166.6600       2919  50.01%       526  18.02%     -0.1591     0.0121     0.0000
--------------------------------------------------------------------------------------------------------------
# TOTAL = 5837, # BADs(Y=1) = 1196, OVERALL BAD RATE = 20.49%, MAX. KS = 7.5822, INFO. VALUE = 0.0231.
--------------------------------------------------------------------------------------------------------------
```

6) “MONO_WOE.IMP” is a file imputing missing values in the case when # of bads or goods for missings is not enough to calculate WOE.

```  *** MEDIAN IMPUTATION OF LTV (NMISS = 1) ***;
IF LTV = . THEN LTV =      100;
```

Written by statcompute

June 10, 2012 at 2:15 am

Posted in SAS, Scorecard

## Generalized Regression Neural Networks and the Implementation with Matlab

Generalized Regression Neural Networks (GRNN) is a special case of Radial Basis Networks (RBN). Compared with its competitor, e.g. standard feedforward neural network, GRNN has several advantages. First of all, the structure of a GRNN is relatively simple and static with 2 layers, namely pattern and summation layers. Once the input goes through each unit in the pattern layer, the relationship between the input and the response would be “memorized” and stored in the unit. As a result, # of units in the pattern layer is equal to # of observations in the training sample. In each pattern unit, a Gaussian PDF would be applied to the network input such that

Theta = EXP[-0.5 * (X – u) `(X – u) / ( Sigma ^2)]

where Theta is the output from pattern units, X is the input, u is training vector stored in the unit, and Sigma is a positive constant known as “spread” or “smooth parameter”. Once Theta is computed, it is passed to the summation layer to calculate Y|X = SUM(Y * Theta) / SUM(Theta), where Y|X is the prediction conditional on X and Y is the response in the training sample. In addition to the above, other benefits of GRNN claimed by Specht (1991) include:

1) The network is able to learning from the training data by “1-pass” training in a fraction of the time it takes to train standard feedforward networks.

2) The spread, Sigma, is the only free parameter in the network, which often can be identified by the V-fold or Split-Sample cross validation.

3) Unlike standard feedforward networks, GRNN estimation is always able to converge to a global solution and won’t be trapped by a local minimum.

With respect to the implementation of GRNN, Matlab might be considered the best computing engine from my limited experience in terms of ease to use and fast speed. A demo is given below on how to use matlab to develop a GRNN and to identify an optimal value of Sigma using split-sample cross validation.

```load credit

Y = transpose(data(:, 2));
[n, m] = size(Y);
train_index = 2:2:m;

% SPLIT THE RESPONSE VECTOR INTO TRAINING AND TESTING
train_Y = Y(train_index);
test_Y = Y;
test_Y(train_index) = [];

% SPLIT X MATRIX INTO TRAINING AND TESTING
X = transpose(data(:, 3:10));
train_X = X(:, train_index);
test_X = X;
test_X(:, train_index) = [];

% STANDARDIZE X MATRIX IN TRAINING SET
[train_X2, map] = mapstd(train_X);

% STANDARDIZE X MATRIX IN TESTING SET
test_X2 = mapstd('apply', test_X, map);

% CHECK IF VARIANCE == 1
var(transpose(train_X2))
var(transpose(test_X2))

j = 0;
for i = 1:0.02:2
% TRAIN A GRNN
grnn = newgrnn(train_X2, train_Y, i);

% CALCULATE THE PREDICTION FOR TESTING SET
test_P = sim(grnn, test_X2);

% COLLECT THE PERFORMANCE
if j == 0
perf = sse(test_Y - test_P);
else
perf = [perf, sse(test_Y - test_P)];
end;
j = j + 1;
end;