Yet Another Blog in Statistical Computing

I can calculate the motion of heavenly bodies but not the madness of people. -Isaac Newton

How to Score Outcomes from Count Models

When calculating the prediction from a count model, many people like to use the expected mean directly. However, from the business standpoint, it might be more appealing to calculate the probability of a specific count outcome. For instance, in the retail banking, it is often of interests to know the probability of an account with one or more delinquencies and then convert this probability to a certain score point. A widely accepted practice is to develop a logistic regression predicting the delinquent account, e.g. Y = 1 for delinquencies >= 1. However, it is also possible to develop a count model, e.g. negative binomial, predicting the number of delinquencies and then estimating the probability of one or more delinquencies given the expected mean.

In the demonstration below, a scoring scheme for count models is shown. From the output, it is clear that the predictiveness of a negative binomial model is comparable to the one of a logistic model in terms KS and ROC statistics.

options nocenter nonumber nodate mprint mlogic symbolgen
        orientation = landscape ls = 125 formchar = "|----|+|---+=|-/\<>*";

libname data 'C:\Users\liuwensui\projects\data';

%include 'C:\Users\liuwensui\projects\code\ks_macro.sas';

data tmp1;
  set data.credit_count;
  if majordrg = 0 then bad = 0;
  else bad = 1;
run;
    
proc logistic data = tmp1 desc;
  model bad = AGE ACADMOS ADEPCNT MINORDRG OWNRENT EXP_INC;
  score data = tmp1 out = logit_out1(rename = (p_1 = logit_prob1));
run;

proc genmod data = tmp1;
  model majordrg = AGE ACADMOS ADEPCNT MINORDRG OWNRENT EXP_INC / dist = nb;
  output out = nb_out1 p = yhat;
run;

data nb_out1;
  set nb_out1;
  nb_prob1 = 1 - pdf('negbinomial', 0, (1 / 4.0362) / (Yhat + (1 / 4.0362)), (1 / 4.0362));
run;

%separation(data = logit_out1, score = logit_prob1, y = bad);
/*
                            GOOD BAD SEPARATION REPORT FOR LOGIT_PROB1 IN DATA LOGIT_OUT1                           
                                     MAXIMUM KS = 35.5049 AT SCORE POINT 0.1773                                     
                     ( AUC STATISTICS = 0.7373, GINI COEFFICIENT = 0.4747, DIVERGENCE = 0.6511 )                    
                                                                                                                    
          MIN        MAX           GOOD        BAD      TOTAL               BAD     CUMULATIVE    BAD      CUMU. BAD
         SCORE      SCORE             #          #          #       ODDS    RATE      BAD RATE  PERCENT      PERCENT
 -------------------------------------------------------------------------------------------------------------------
  BAD     0.3369     0.9998         557        787      1,344       0.71   58.56%      58.56%    30.73%      30.73% 
   |      0.2157     0.3369         944        401      1,345       2.35   29.81%      44.18%    15.66%      46.39% 
   |      0.1802     0.2157       1,039        305      1,344       3.41   22.69%      37.02%    11.91%      58.30% 
   |      0.1619     0.1802       1,099        246      1,345       4.47   18.29%      32.34%     9.61%      67.90% 
   |      0.1489     0.1619       1,124        220      1,344       5.11   16.37%      29.14%     8.59%      76.49% 
   |      0.1383     0.1489       1,171        174      1,345       6.73   12.94%      26.44%     6.79%      83.29% 
   |      0.1255     0.1383       1,213        131      1,344       9.26    9.75%      24.06%     5.12%      88.40% 
   |      0.1109     0.1255       1,254         91      1,345      13.78    6.77%      21.89%     3.55%      91.96% 
   V      0.0885     0.1109       1,246         98      1,344      12.71    7.29%      20.27%     3.83%      95.78% 
 GOOD     0.0001     0.0885       1,236        108      1,344      11.44    8.04%      19.05%     4.22%     100.00% 
       ========== ========== ========== ========== ==========                                                       
          0.0001     0.9998      10,883      2,561     13,444                                                       
*/
    
%separation(data = nb_out1, score = nb_prob1, y = bad);
/*
                               GOOD BAD SEPARATION REPORT FOR NB_PROB1 IN DATA NB_OUT1                              
                                     MAXIMUM KS = 35.8127 AT SCORE POINT 0.2095                                     
                     ( AUC STATISTICS = 0.7344, GINI COEFFICIENT = 0.4687, DIVERGENCE = 0.7021 )                    
                                                                                                                    
          MIN        MAX           GOOD        BAD      TOTAL               BAD     CUMULATIVE    BAD      CUMU. BAD
         SCORE      SCORE             #          #          #       ODDS    RATE      BAD RATE  PERCENT      PERCENT
 -------------------------------------------------------------------------------------------------------------------
  BAD     0.2929     0.8804         561        783      1,344       0.72   58.26%      58.26%    30.57%      30.57% 
   |      0.2367     0.2929         944        401      1,345       2.35   29.81%      44.03%    15.66%      46.23% 
   |      0.2117     0.2367       1,025        319      1,344       3.21   23.74%      37.27%    12.46%      58.69% 
   |      0.1947     0.2117       1,106        239      1,345       4.63   17.77%      32.39%     9.33%      68.02% 
   |      0.1813     0.1947       1,131        213      1,344       5.31   15.85%      29.08%     8.32%      76.34% 
   |      0.1675     0.1813       1,191        154      1,345       7.73   11.45%      26.14%     6.01%      82.35% 
   |      0.1508     0.1675       1,208        136      1,344       8.88   10.12%      23.86%     5.31%      87.66% 
   |      0.1298     0.1508       1,247         98      1,345      12.72    7.29%      21.78%     3.83%      91.49% 
   V      0.0978     0.1297       1,242        102      1,344      12.18    7.59%      20.21%     3.98%      95.47% 
 GOOD     0.0000     0.0978       1,228        116      1,344      10.59    8.63%      19.05%     4.53%     100.00% 
       ========== ========== ========== ========== ==========                                                       
          0.0000     0.8804      10,883      2,561     13,444                                                       
*/
Advertisements

Written by statcompute

February 17, 2013 at 4:18 pm

%d bloggers like this: