I can calculate the motion of heavenly bodies but not the madness of people. -Isaac Newton

## How to Score Outcomes from Count Models

When calculating the prediction from a count model, many people like to use the expected mean directly. However, from the business standpoint, it might be more appealing to calculate the probability of a specific count outcome. For instance, in the retail banking, it is often of interests to know the probability of an account with one or more delinquencies and then convert this probability to a certain score point. A widely accepted practice is to develop a logistic regression predicting the delinquent account, e.g. Y = 1 for delinquencies >= 1. However, it is also possible to develop a count model, e.g. negative binomial, predicting the number of delinquencies and then estimating the probability of one or more delinquencies given the expected mean.

In the demonstration below, a scoring scheme for count models is shown. From the output, it is clear that the predictiveness of a negative binomial model is comparable to the one of a logistic model in terms KS and ROC statistics.

```options nocenter nonumber nodate mprint mlogic symbolgen
orientation = landscape ls = 125 formchar = "|----|+|---+=|-/\<>*";

libname data 'C:\Users\liuwensui\projects\data';

%include 'C:\Users\liuwensui\projects\code\ks_macro.sas';

data tmp1;
set data.credit_count;
if majordrg = 0 then bad = 0;
run;

proc logistic data = tmp1 desc;
score data = tmp1 out = logit_out1(rename = (p_1 = logit_prob1));
run;

proc genmod data = tmp1;
model majordrg = AGE ACADMOS ADEPCNT MINORDRG OWNRENT EXP_INC / dist = nb;
output out = nb_out1 p = yhat;
run;

data nb_out1;
set nb_out1;
nb_prob1 = 1 - pdf('negbinomial', 0, (1 / 4.0362) / (Yhat + (1 / 4.0362)), (1 / 4.0362));
run;

%separation(data = logit_out1, score = logit_prob1, y = bad);
/*
GOOD BAD SEPARATION REPORT FOR LOGIT_PROB1 IN DATA LOGIT_OUT1
MAXIMUM KS = 35.5049 AT SCORE POINT 0.1773
( AUC STATISTICS = 0.7373, GINI COEFFICIENT = 0.4747, DIVERGENCE = 0.6511 )

SCORE      SCORE             #          #          #       ODDS    RATE      BAD RATE  PERCENT      PERCENT
-------------------------------------------------------------------------------------------------------------------
BAD     0.3369     0.9998         557        787      1,344       0.71   58.56%      58.56%    30.73%      30.73%
|      0.2157     0.3369         944        401      1,345       2.35   29.81%      44.18%    15.66%      46.39%
|      0.1802     0.2157       1,039        305      1,344       3.41   22.69%      37.02%    11.91%      58.30%
|      0.1619     0.1802       1,099        246      1,345       4.47   18.29%      32.34%     9.61%      67.90%
|      0.1489     0.1619       1,124        220      1,344       5.11   16.37%      29.14%     8.59%      76.49%
|      0.1383     0.1489       1,171        174      1,345       6.73   12.94%      26.44%     6.79%      83.29%
|      0.1255     0.1383       1,213        131      1,344       9.26    9.75%      24.06%     5.12%      88.40%
|      0.1109     0.1255       1,254         91      1,345      13.78    6.77%      21.89%     3.55%      91.96%
V      0.0885     0.1109       1,246         98      1,344      12.71    7.29%      20.27%     3.83%      95.78%
GOOD     0.0001     0.0885       1,236        108      1,344      11.44    8.04%      19.05%     4.22%     100.00%
========== ========== ========== ========== ==========
0.0001     0.9998      10,883      2,561     13,444
*/

%separation(data = nb_out1, score = nb_prob1, y = bad);
/*
GOOD BAD SEPARATION REPORT FOR NB_PROB1 IN DATA NB_OUT1
MAXIMUM KS = 35.8127 AT SCORE POINT 0.2095
( AUC STATISTICS = 0.7344, GINI COEFFICIENT = 0.4687, DIVERGENCE = 0.7021 )

SCORE      SCORE             #          #          #       ODDS    RATE      BAD RATE  PERCENT      PERCENT
-------------------------------------------------------------------------------------------------------------------
BAD     0.2929     0.8804         561        783      1,344       0.72   58.26%      58.26%    30.57%      30.57%
|      0.2367     0.2929         944        401      1,345       2.35   29.81%      44.03%    15.66%      46.23%
|      0.2117     0.2367       1,025        319      1,344       3.21   23.74%      37.27%    12.46%      58.69%
|      0.1947     0.2117       1,106        239      1,345       4.63   17.77%      32.39%     9.33%      68.02%
|      0.1813     0.1947       1,131        213      1,344       5.31   15.85%      29.08%     8.32%      76.34%
|      0.1675     0.1813       1,191        154      1,345       7.73   11.45%      26.14%     6.01%      82.35%
|      0.1508     0.1675       1,208        136      1,344       8.88   10.12%      23.86%     5.31%      87.66%
|      0.1298     0.1508       1,247         98      1,345      12.72    7.29%      21.78%     3.83%      91.49%
V      0.0978     0.1297       1,242        102      1,344      12.18    7.59%      20.21%     3.98%      95.47%
GOOD     0.0000     0.0978       1,228        116      1,344      10.59    8.63%      19.05%     4.53%     100.00%
========== ========== ========== ========== ==========
0.0000     0.8804      10,883      2,561     13,444
*/
```

Written by statcompute

February 17, 2013 at 4:18 pm

## A Grid Search for The Optimal Setting in Feed-Forward Neural Networks

The feed-forward neural network is a very powerful classification model in the machine learning content. Since the goodness-of-fit of a neural network is majorly dominated by the model complexity, it is very tempting for a modeler to over-parameterize the neural network by using too many hidden layers or/and hidden units.

As pointed out by Brian Ripley in his famous book “Modern Applied Statistics with S”, the complexity of a neural network can be regulated by a hyper-parameter called “weight decay” to penalize the weights of hidden units. Per Ripley, the use of weight decay can both help the optimization process and avoid the over-ļ¬tting.

Up till now, it becomes clear that the balance between the network complexity and the size of weight decay should form the optimal setting for a neural network. The only question remained is how to identify such a combination. In the real world, practitioners usually would use v-folder or cross-sample validation. However, given the expensive computing cost of a neural network, the cross-sample validation seems more efficient then the v-folder. In addition, due to the presence of local minimum, the validation result from a set of averaged models instead of a single model is deemed more reliable.

The example below shows a grip search strategy for the optimal setting in a neural network by cross-sample validation. As suggested by Ripley, the weight decay is in the approximate range between 0.01 and 0.1 for the entropy fit. For the simplicity, just a few numbers of hidden units are tried. However, with the availability of computing power, a finer grip search for a good combination between weight decay and the number of hidden units would be highly recommended.

```> # DATA PREPARATIONS
> df2 <- df1[df1\$CARDHLDR == 1, 2:12]
> X <- I(as.matrix(df2[-1]))
> st.X <- scale(X)
> Y <- I(as.matrix(df2[1]))
> df3 <- data.frame(X = st.X, Y);
>
> # DIVIDE DATA INTO TESTING AND TRAINING SETS
> set.seed(2013)
> rows <- sample(1:nrow(df3), nrow(df3) - 1000)
> set1 <- df3[rows, ]
> set2 <- df3[-rows, ]
>
> result <- c(NULL, NULL, NULL, NULL, NULL)
> n_nets <- 10
> # SEARCH FOR OPTIMAL WEIGHT DECAY
> for (w in c(0.01, 0.05, 0.1))
+ {
+   # SEARCH FOR OPTIMAL NUMBER OF HIDDEN UNITS
+   for (n in c(1, 5, 10, 20))
+   {
+     # CREATE A VECTOR OF RANDOM SEEDS
+     rv <- round(runif(n_nets) * 100)
+     # FOR EACH SETTING, RUN NEURAL NET MULTIPLE TIMES
+     for (i in 1:n_nets)
+     {
+       # INITIATE THE RANDOM STATE FOR EACH NET
+       set.seed(rv[i]);
+       # TRAIN NEURAL NETS
+       net <- nnet::nnet(Y ~ X, size = n, data = set1, entropy = TRUE, maxit = 1000, decay = w, skip = TRUE, trace = FALSE)
+       # COLLECT PREDICTIONS TO DO MODEL AVERAGING
+       if (i == 1) prob <- predict(net, set2) else prob <- prob + predict(net, set2)
+     }
+     # CALCULATE AREA UNDER CURVE OF THE MODEL AVERAGING PREDICTION
+     roc <- verification::roc.area(set2\$Y, prob / n_nets)[1]
+     # COLLECT RESULTS
+     result <- rbind(result, c(w, n, roc, round(mean(prob / n_nets), 4), round(mean(set2\$Y), 4)))
+   }
+ }
> result2 <- data.frame(wt_decay = unlist(result[, 1]), n_units = unlist(result[, 2]),auc = unlist(result[, 3]),
+                       pred_rate = unlist(result[, 4]), obsv_rate = unlist(result[, 5]))
> result2[order(result2\$auc, decreasing = T), ]
wt_decay n_units       auc pred_rate obsv_rate
1      0.01       1 0.6638209    0.0923     0.095
9      0.10       1 0.6625414    0.0923     0.095
5      0.05       1 0.6557022    0.0922     0.095
3      0.01      10 0.6530154    0.0938     0.095
8      0.05      20 0.6528293    0.0944     0.095
6      0.05       5 0.6516662    0.0917     0.095
2      0.01       5 0.6498284    0.0928     0.095
7      0.05      10 0.6456063    0.0934     0.095
4      0.01      20 0.6446176    0.0940     0.095
10     0.10       5 0.6434545    0.0927     0.095
12     0.10      20 0.6415935    0.0938     0.095
11     0.10      10 0.6348822    0.0928     0.095
```

Written by statcompute

February 3, 2013 at 10:20 pm

## A SAS Macro for Breusch-Pagan Test

In SAS, Breusch-Pagan test for Heteroscedasticity in a linear regression can be conducted with MODEL procedure in SAS/ETS, as shown in the code snippet below.

```data one;
do i = 1 to 100;
x1 = uniform(1);
x2 = uniform(2);
r  = normal(1) * 0.1;
if x2 > 0.5 then r = r * 2;
y = 10 + 8 * x1 + 6 * x2 + r;
output;
end;
run;

proc model data = one;
parms b0 b1 b2;
y = b0 + b1 * x1 + b2 * x2;
fit y / breusch = (1 x1 x2);
run;
/*
Heteroscedasticity Test
Equation        Test               Statistic     DF    Pr > ChiSq    Variables

y               Breusch-Pagan          10.44      2        0.0054    1, x1, x2
*/
```

However, in a forecasting model that I am recently working on, I find that it is not convenient to use “proc model” every time when I want to do Breusch-Pagan test and rather prefer a more generic solution not tied to a specific SAS module or procedure and that would only need to take a minimum set of inputs instead of specifying out a full model. As a result, I draft a simple sas macro to do Breusch-Pagan test, which gives the identical result as the one from MODEL procedure. Hopeful, others might find this macro useful as well.

```proc reg data = one;
model y = x1 x2;
output out = two r = resid;
run;

%macro hetero_bp(r = , x = , data = );
***********************************************************;
* THE SAS MACRO IS TO CALCULATE BREUSCH-PAGEN TEST FOR    *
* HETEROSKEDASTICITY                                      *;
* ======================================================= *;
* PAMAMETERS:                                             *;
*  DATA: INPUT SAS DATA TABLE                             *;
*  R   : RESIDUAL VALUES FROM A LINEAR REGRESSION         *;
*  X   : A LIST OF NUMERIC VARIABLES TO MODEL ERROR       *;
*        VARIANCE IN BREUSCH-PAGEN TEST                   *;
* ======================================================= *;
* CONTACT:                                                *;
*  WENSUI.LIU@53.COM                                      *;
***********************************************************;

data _data_(keep = r2 &x);
set &data;
where r ~= .;
r2 = &r ** 2;
run;

ods output nobs = _nobs_;
ods output anova = _anova_;
ods output fitstatistics = _fits_;
ods listing close;
proc reg data = _last_;
model r2 = &x;
run;
ods listing;

proc sql noprint;
select distinct NObsUsed into :n from _nobs_;
select df into :df from _anova_ where upcase(compress(source, ' ')) = 'MODEL';
select nvalue2 into :r2 from _fits_ where upcase(compress(label2, ' ')) = 'R-SQUARE';
run;
%put &r2;

data _result_;
chi_square = &r2 * &n;
df         = &df;
p_value    = 1 - probchi(chi_square, df);
run;

proc report data = _last_ spacing = 1 headline nowindows split = "*";
column(" * BREUSCH-PAGEN TEST FOR HETEROSKEDASTICITY * "
chi_square df p_value);
define chi_square  / "CHI-SQUARE"  width = 15;
define df          / "DF"          width = 5;
define p_value     / "P-VALUE"     width = 15 format = 12.8;
run;

proc datasets library = work;
delete _: / memtype = data;
run;

%mend hetero_bp;

%hetero_bp(r = resid, x = x1 x2, data = two);
/*
BREUSCH-PAGEN TEST FOR HETEROSKEDASTICITY

CHI-SQUARE    DF         P-VALUE
-----------------------------------------
10.4389     2      0.00541030
*/
```

Written by statcompute

February 2, 2013 at 3:37 pm

Posted in Econometrics, SAS, Statistics

Tagged with ,