Modeling LGD with Proportional Odds Model

The LGD model is an important component in the expected loss calculation. In https://statcompute.wordpress.com/2015/11/01/quasi-binomial-model-in-sas, I discussed how to model LGD with the quasi-binomial regression that is simple and makes no distributional assumption.

In the real-world LGD data, we usually would observe 3 ordered categories of values, including 0, 1, and in-betweens. In cases with a nontrivial number of 0 and 1 values, the ordered logit model, which is also known as Proportional Odds model, can be applicable. In the demonstration below, I will show how we can potentially use the proportional odds model in the LGD model development.

First of all, we need to categorize all numeric LGD values into three ordinal categories. As shown below, there are more than 30% of 0 and 1 values.

df <- read.csv("lgd.csv")
df$lgd <- round(1 - df$Recovery_rate, 4)
df$lgd_cat <- cut(df$lgd, breaks = c(-Inf, 0, 0.9999, Inf), labels = c("L", "M", "H"), ordered_result = T)
summary(df$lgd_cat)

#   L    M    H 
# 730 1672  143 

The estimation of a proportional odds model is straightforward with clm() in the ordinal package or polr() in the MASS package. As demonstrated below, in addition to the coefficient for LTV, there are 2 intercepts to differentiate 3 categories.

m1 <- ordinal::clm(lgd_cat ~ LTV, data = df)
summary(m1)

#Coefficients:
#    Estimate Std. Error z value Pr(>|z|)    
#LTV   2.0777     0.1267    16.4   <2e-16 ***
#---
#Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
#Threshold coefficients:
#    Estimate Std. Error z value
#L|M  0.38134    0.08676   4.396
#M|H  4.50145    0.14427  31.201

It is important to point out that, in a proportional odds model, it is the cumulative probability that is derived from the linear combination of model variables. For instance, the cumulative probability of LGD belonging to L or M is formulated as

Prob(LGD <= M) = Exp(4.50 – 2.08 * LTV) / (1 + Exp(4.50 – 2.08 * LTV))

Likewise, we would have

Prob(LGD <= L) = Exp(0.38 – 2.08 * LTV) / (1 + Exp(0.38 – 2.08 * LTV))

With above cumulative probabilities, then we can calculate the probability of each category as below.

Prob(LGD = L) = Prob(LGD <= L)
Prob(LGD = M) = Prob(LGD <= M) – Prob(LGD <= L)
Prob(LGD = H) = 1 – Prob(LGD <= M)

The R code is showing the detailed calculation how to convert cumulative probabilities to probabilities of interest.

cumprob_L <- exp(df$LTV * (-m1$beta) + m1$Theta[1]) / (1 + exp(df$LTV * (-m1$beta) + m1$Theta[1])) 
cumprob_M <- exp(df$LTV * (-m1$beta) + m1$Theta[2]) / (1 + exp(df$LTV * (-m1$beta) + m1$Theta[2])) 
prob_L <- cumprob_L
prob_M <- cumprob_M - cumprob_L
prob_H <- 1 - cumprob_M
pred <- data.frame(prob_L, prob_M, prob_H)
apply(pred, 2, mean)

#    prob_L     prob_M     prob_H 
#0.28751210 0.65679888 0.05568903 

After predicting the probability of each category, we would need another sub-model to estimate the conditional LGD for lgd_cat = “M” with either Beta or Simplex regression. (See https://statcompute.wordpress.com/2014/10/27/flexible-beta-modeling and https://statcompute.wordpress.com/2014/02/02/simplex-model-in-r) The final LGD prediction can be formulated as

E(LGD|X)
= Prob(Y = 0|X) * E(Y|X, Y = 0) + Prob(Y = 1|X) * E(Y|X, Y = 1) + Prob(0 < Y < 1|X) * E(Y|X, 0 < Y < 1)
= Prob(Y = 1|X) + Prob(0 < Y < 1|X) * E(Y|X, 0 < Y < 1)

where E(Y|X, 0 < Y < 1) can be calculated from the sub-model.