Yet Another Blog in Statistical Computing

I can calculate the motion of heavenly bodies but not the madness of people. -Isaac Newton

A Prototype of Monotonic Binning Algorithm with R

I’ve been asked many time if I have a piece of R code implementing the monotonic binning algorithm, similar to the one that I developed with SAS (https://statcompute.wordpress.com/2012/06/10/a-sas-macro-implementing-monotonic-woe-transformation-in-scorecard-development) and with Python (https://statcompute.wordpress.com/2012/12/08/monotonic-binning-with-python). Today, I finally had time to draft a quick prototype with 20 lines of R code, which is however barely useable without the further polish. But it is still a little surprising to me how efficient it can be to use R in algorithm prototyping, much sleeker than SAS macro.

library(sas7bdat)
library(Hmisc)

bin <- function(x, y){
  n <- min(50, length(unique(x)))
  repeat {
    n   <- n - 1
    d1  <- data.frame(x, y, bin = cut2(x, g = n)) 
    d2  <- aggregate(d1[-3], d1[3], mean)
    cor <- cor(d2[-1], method = "spearman")
    if(abs(cor[1, 2]) == 1) break
  }
  d2[2] <- NULL
  colnames(d2) <- c('LEVEL', 'RATE')
  head <- paste(toupper(substitute(y)), " RATE by ", toupper(substitute(x)), sep = '')
  cat("+-", rep("-", nchar(head)), "-+\n", sep = '')
  cat("| ", head, ' |\n', sep = '')
  cat("+-", rep("-", nchar(head)), "-+\n", sep = '')
  print(d2)
  cat("\n")
}

data <- read.sas7bdat("C:\\Users\\liuwensui\\Downloads\\accepts.sas7bdat")
attach(data)

bin(bureau_score, bad)
bin(age_oldest_tr, bad)
bin(tot_income, bad)
bin(tot_tr, bad)

R output:

+--------------------------+
| BAD RATE by BUREAU_SCORE |
+--------------------------+
       LEVEL       RATE
1  [443,618) 0.44639376
2  [618,643) 0.38446602
3  [643,658) 0.31835938
4  [658,673) 0.23819302
5  [673,686) 0.19838057
6  [686,702) 0.17850288
7  [702,715) 0.14168378
8  [715,731) 0.09815951
9  [731,752) 0.07212476
10 [752,776) 0.05487805
11 [776,848] 0.02605210

+---------------------------+
| BAD RATE by AGE_OLDEST_TR |
+---------------------------+
       LEVEL       RATE
1  [  1, 34) 0.33333333
2  [ 34, 62) 0.30560928
3  [ 62, 87) 0.25145068
4  [ 87,113) 0.23346304
5  [113,130) 0.21616162
6  [130,149) 0.20036101
7  [149,168) 0.19361702
8  [168,198) 0.15530303
9  [198,245) 0.11111111
10 [245,308) 0.10700389
11 [308,588] 0.08730159

+------------------------+
| BAD RATE by TOT_INCOME |
+------------------------+
           LEVEL      RATE
1 [   0,   2570) 0.2498715
2 [2570,   4510) 0.2034068
3 [4510,8147167] 0.1602327

+--------------------+
| BAD RATE by TOT_TR |
+--------------------+
    LEVEL      RATE
1 [ 0,12) 0.2672370
2 [12,22) 0.1827676
3 [22,77] 0.1422764
Advertisements

Written by statcompute

May 4, 2013 at 4:33 pm

%d bloggers like this: