## Archive for **February 2013**

## How to Score Outcomes from Count Models

When calculating the prediction from a count model, many people like to use the expected mean directly. However, from the business standpoint, it might be more appealing to calculate the probability of a specific count outcome. For instance, in the retail banking, it is often of interests to know the probability of an account with one or more delinquencies and then convert this probability to a certain score point. A widely accepted practice is to develop a logistic regression predicting the delinquent account, e.g. Y = 1 for delinquencies >= 1. However, it is also possible to develop a count model, e.g. negative binomial, predicting the number of delinquencies and then estimating the probability of one or more delinquencies given the expected mean.

In the demonstration below, a scoring scheme for count models is shown. From the output, it is clear that the predictiveness of a negative binomial model is comparable to the one of a logistic model in terms KS and ROC statistics.

options nocenter nonumber nodate mprint mlogic symbolgen orientation = landscape ls = 125 formchar = "|----|+|---+=|-/\<>*"; libname data 'C:\Users\liuwensui\projects\data'; %include 'C:\Users\liuwensui\projects\code\ks_macro.sas'; data tmp1; set data.credit_count; if majordrg = 0 then bad = 0; else bad = 1; run; proc logistic data = tmp1 desc; model bad = AGE ACADMOS ADEPCNT MINORDRG OWNRENT EXP_INC; score data = tmp1 out = logit_out1(rename = (p_1 = logit_prob1)); run; proc genmod data = tmp1; model majordrg = AGE ACADMOS ADEPCNT MINORDRG OWNRENT EXP_INC / dist = nb; output out = nb_out1 p = yhat; run; data nb_out1; set nb_out1; nb_prob1 = 1 - pdf('negbinomial', 0, (1 / 4.0362) / (Yhat + (1 / 4.0362)), (1 / 4.0362)); run; %separation(data = logit_out1, score = logit_prob1, y = bad); /* GOOD BAD SEPARATION REPORT FOR LOGIT_PROB1 IN DATA LOGIT_OUT1 MAXIMUM KS = 35.5049 AT SCORE POINT 0.1773 ( AUC STATISTICS = 0.7373, GINI COEFFICIENT = 0.4747, DIVERGENCE = 0.6511 ) MIN MAX GOOD BAD TOTAL BAD CUMULATIVE BAD CUMU. BAD SCORE SCORE # # # ODDS RATE BAD RATE PERCENT PERCENT ------------------------------------------------------------------------------------------------------------------- BAD 0.3369 0.9998 557 787 1,344 0.71 58.56% 58.56% 30.73% 30.73% | 0.2157 0.3369 944 401 1,345 2.35 29.81% 44.18% 15.66% 46.39% | 0.1802 0.2157 1,039 305 1,344 3.41 22.69% 37.02% 11.91% 58.30% | 0.1619 0.1802 1,099 246 1,345 4.47 18.29% 32.34% 9.61% 67.90% | 0.1489 0.1619 1,124 220 1,344 5.11 16.37% 29.14% 8.59% 76.49% | 0.1383 0.1489 1,171 174 1,345 6.73 12.94% 26.44% 6.79% 83.29% | 0.1255 0.1383 1,213 131 1,344 9.26 9.75% 24.06% 5.12% 88.40% | 0.1109 0.1255 1,254 91 1,345 13.78 6.77% 21.89% 3.55% 91.96% V 0.0885 0.1109 1,246 98 1,344 12.71 7.29% 20.27% 3.83% 95.78% GOOD 0.0001 0.0885 1,236 108 1,344 11.44 8.04% 19.05% 4.22% 100.00% ========== ========== ========== ========== ========== 0.0001 0.9998 10,883 2,561 13,444 */ %separation(data = nb_out1, score = nb_prob1, y = bad); /* GOOD BAD SEPARATION REPORT FOR NB_PROB1 IN DATA NB_OUT1 MAXIMUM KS = 35.8127 AT SCORE POINT 0.2095 ( AUC STATISTICS = 0.7344, GINI COEFFICIENT = 0.4687, DIVERGENCE = 0.7021 ) MIN MAX GOOD BAD TOTAL BAD CUMULATIVE BAD CUMU. BAD SCORE SCORE # # # ODDS RATE BAD RATE PERCENT PERCENT ------------------------------------------------------------------------------------------------------------------- BAD 0.2929 0.8804 561 783 1,344 0.72 58.26% 58.26% 30.57% 30.57% | 0.2367 0.2929 944 401 1,345 2.35 29.81% 44.03% 15.66% 46.23% | 0.2117 0.2367 1,025 319 1,344 3.21 23.74% 37.27% 12.46% 58.69% | 0.1947 0.2117 1,106 239 1,345 4.63 17.77% 32.39% 9.33% 68.02% | 0.1813 0.1947 1,131 213 1,344 5.31 15.85% 29.08% 8.32% 76.34% | 0.1675 0.1813 1,191 154 1,345 7.73 11.45% 26.14% 6.01% 82.35% | 0.1508 0.1675 1,208 136 1,344 8.88 10.12% 23.86% 5.31% 87.66% | 0.1298 0.1508 1,247 98 1,345 12.72 7.29% 21.78% 3.83% 91.49% V 0.0978 0.1297 1,242 102 1,344 12.18 7.59% 20.21% 3.98% 95.47% GOOD 0.0000 0.0978 1,228 116 1,344 10.59 8.63% 19.05% 4.53% 100.00% ========== ========== ========== ========== ========== 0.0000 0.8804 10,883 2,561 13,444 */

## A Grid Search for The Optimal Setting in Feed-Forward Neural Networks

The feed-forward neural network is a very powerful classification model in the machine learning content. Since the goodness-of-fit of a neural network is majorly dominated by the model complexity, it is very tempting for a modeler to over-parameterize the neural network by using too many hidden layers or/and hidden units.

As pointed out by Brian Ripley in his famous book “Modern Applied Statistics with S”, the complexity of a neural network can be regulated by a hyper-parameter called “weight decay” to penalize the weights of hidden units. Per Ripley, the use of weight decay can both help the optimization process and avoid the over-ļ¬tting.

Up till now, it becomes clear that the balance between the network complexity and the size of weight decay should form the optimal setting for a neural network. The only question remained is how to identify such a combination. In the real world, practitioners usually would use v-folder or cross-sample validation. However, given the expensive computing cost of a neural network, the cross-sample validation seems more efficient then the v-folder. In addition, due to the presence of local minimum, the validation result from a set of averaged models instead of a single model is deemed more reliable.

The example below shows a grip search strategy for the optimal setting in a neural network by cross-sample validation. As suggested by Ripley, the weight decay is in the approximate range between 0.01 and 0.1 for the entropy fit. For the simplicity, just a few numbers of hidden units are tried. However, with the availability of computing power, a finer grip search for a good combination between weight decay and the number of hidden units would be highly recommended.

> # DATA PREPARATIONS > df1 <- read.csv('credit_count.csv') > df2 <- df1[df1$CARDHLDR == 1, 2:12] > X <- I(as.matrix(df2[-1])) > st.X <- scale(X) > Y <- I(as.matrix(df2[1])) > df3 <- data.frame(X = st.X, Y); > > # DIVIDE DATA INTO TESTING AND TRAINING SETS > set.seed(2013) > rows <- sample(1:nrow(df3), nrow(df3) - 1000) > set1 <- df3[rows, ] > set2 <- df3[-rows, ] > > result <- c(NULL, NULL, NULL, NULL, NULL) > n_nets <- 10 > # SEARCH FOR OPTIMAL WEIGHT DECAY > for (w in c(0.01, 0.05, 0.1)) + { + # SEARCH FOR OPTIMAL NUMBER OF HIDDEN UNITS + for (n in c(1, 5, 10, 20)) + { + # CREATE A VECTOR OF RANDOM SEEDS + rv <- round(runif(n_nets) * 100) + # FOR EACH SETTING, RUN NEURAL NET MULTIPLE TIMES + for (i in 1:n_nets) + { + # INITIATE THE RANDOM STATE FOR EACH NET + set.seed(rv[i]); + # TRAIN NEURAL NETS + net <- nnet::nnet(Y ~ X, size = n, data = set1, entropy = TRUE, maxit = 1000, decay = w, skip = TRUE, trace = FALSE) + # COLLECT PREDICTIONS TO DO MODEL AVERAGING + if (i == 1) prob <- predict(net, set2) else prob <- prob + predict(net, set2) + } + # CALCULATE AREA UNDER CURVE OF THE MODEL AVERAGING PREDICTION + roc <- verification::roc.area(set2$Y, prob / n_nets)[1] + # COLLECT RESULTS + result <- rbind(result, c(w, n, roc, round(mean(prob / n_nets), 4), round(mean(set2$Y), 4))) + } + } > result2 <- data.frame(wt_decay = unlist(result[, 1]), n_units = unlist(result[, 2]),auc = unlist(result[, 3]), + pred_rate = unlist(result[, 4]), obsv_rate = unlist(result[, 5])) > result2[order(result2$auc, decreasing = T), ] wt_decay n_units auc pred_rate obsv_rate 1 0.01 1 0.6638209 0.0923 0.095 9 0.10 1 0.6625414 0.0923 0.095 5 0.05 1 0.6557022 0.0922 0.095 3 0.01 10 0.6530154 0.0938 0.095 8 0.05 20 0.6528293 0.0944 0.095 6 0.05 5 0.6516662 0.0917 0.095 2 0.01 5 0.6498284 0.0928 0.095 7 0.05 10 0.6456063 0.0934 0.095 4 0.01 20 0.6446176 0.0940 0.095 10 0.10 5 0.6434545 0.0927 0.095 12 0.10 20 0.6415935 0.0938 0.095 11 0.10 10 0.6348822 0.0928 0.095

## A SAS Macro for Breusch-Pagan Test

In SAS, Breusch-Pagan test for Heteroscedasticity in a linear regression can be conducted with MODEL procedure in SAS/ETS, as shown in the code snippet below.

data one; do i = 1 to 100; x1 = uniform(1); x2 = uniform(2); r = normal(1) * 0.1; if x2 > 0.5 then r = r * 2; y = 10 + 8 * x1 + 6 * x2 + r; output; end; run; proc model data = one; parms b0 b1 b2; y = b0 + b1 * x1 + b2 * x2; fit y / breusch = (1 x1 x2); run; /* Heteroscedasticity Test Equation Test Statistic DF Pr > ChiSq Variables y Breusch-Pagan 10.44 2 0.0054 1, x1, x2 */

However, in a forecasting model that I am recently working on, I find that it is not convenient to use “proc model” every time when I want to do Breusch-Pagan test and rather prefer a more generic solution not tied to a specific SAS module or procedure and that would only need to take a minimum set of inputs instead of specifying out a full model. As a result, I draft a simple sas macro to do Breusch-Pagan test, which gives the identical result as the one from MODEL procedure. Hopeful, others might find this macro useful as well.

proc reg data = one; model y = x1 x2; output out = two r = resid; run; %macro hetero_bp(r = , x = , data = ); ***********************************************************; * THE SAS MACRO IS TO CALCULATE BREUSCH-PAGEN TEST FOR * * HETEROSKEDASTICITY *; * ======================================================= *; * PAMAMETERS: *; * DATA: INPUT SAS DATA TABLE *; * R : RESIDUAL VALUES FROM A LINEAR REGRESSION *; * X : A LIST OF NUMERIC VARIABLES TO MODEL ERROR *; * VARIANCE IN BREUSCH-PAGEN TEST *; * ======================================================= *; * CONTACT: *; * WENSUI.LIU@53.COM *; ***********************************************************; data _data_(keep = r2 &x); set &data; where r ~= .; r2 = &r ** 2; run; ods output nobs = _nobs_; ods output anova = _anova_; ods output fitstatistics = _fits_; ods listing close; proc reg data = _last_; model r2 = &x; run; ods listing; proc sql noprint; select distinct NObsUsed into :n from _nobs_; select df into :df from _anova_ where upcase(compress(source, ' ')) = 'MODEL'; select nvalue2 into :r2 from _fits_ where upcase(compress(label2, ' ')) = 'R-SQUARE'; run; %put &r2; data _result_; chi_square = &r2 * &n; df = &df; p_value = 1 - probchi(chi_square, df); run; proc report data = _last_ spacing = 1 headline nowindows split = "*"; column(" * BREUSCH-PAGEN TEST FOR HETEROSKEDASTICITY * " chi_square df p_value); define chi_square / "CHI-SQUARE" width = 15; define df / "DF" width = 5; define p_value / "P-VALUE" width = 15 format = 12.8; run; proc datasets library = work; delete _: / memtype = data; run; %mend hetero_bp; %hetero_bp(r = resid, x = x1 x2, data = two); /* BREUSCH-PAGEN TEST FOR HETEROSKEDASTICITY CHI-SQUARE DF P-VALUE ----------------------------------------- 10.4389 2 0.00541030 */