Yet Another Blog in Statistical Computing

I can calculate the motion of heavenly bodies but not the madness of people. -Isaac Newton

Archive for March 2012

Modeling Rates and Proportions in SAS – 5


Tobit model can be considered a special case of naïve OLS regression with the dependent variable censored and therefore observable only in a certain interval, which is (0, 1) for rates and proportions in my study. Specifically, this class of models assumes that there is a latent variable Y* such that

Y* = X`B + e, where e ~ N(0, sigma ^ 2) & Y = Min(1, Max(0, Y*))

As a result, the representation of rates and proportions, Y, bounded by (0, 1) can be considered the observable part of a normally distributed variable Y* ~ N(X`B, sigma ^ 2) on the real line. However, a fundamental argument against this censoring assumption is that the reason that values outside the boundary of [0, 1] are not observable is not because they are censored but because they are not defined. Hence, the censored normal distribution might not be an appropriate assumption for percentages and proportions.

In SAS, the most convenient way to model Tobit model is QLIM procedure in SAS / ETS module. However, in order to clearly illustrate the log likelihood function of Tobit model, we’d like stick to NLMIXED procedure and estimate the Tobit model with maximum likelihood estimation. The maximum likelihood estimator for a Tobit model assumes that the errors are normal and homoscedastic and would be otherwise inconsistent. In the previous section, it is shown that the heteroscedasticity presents due to the nature of rate and proportion outcomes. As a result, the simultaneous estimation of a variance model is also necessary to account for heteroscedasticity by the function VAR(e) = sigma ^ 2 * (1 + EXP(Z`G)). Therefore, there are 2 components in our Tobit model specification, a mean sub-model and a variance sub-model. As illustrated in the output below, a couple independent variables, e.g. X1 and X2, are significant in both models, showing that the conditional variance is not independent of the mean in proportion outcomes.


Written by statcompute

March 24, 2012 at 5:45 pm

Posted in SAS, Statistical Models

Integration of Economic Information in Credit Scorecard

The incorporation of economic information in the credit scorecard has been researched extensively in the academic world and started to receive increasing attention in the banking industry since the most recent global economic recession.

To our knowledge, the conventional usage of economic information in the credit scorecard is mostly for the segmentation purpose, e.g. to segment the business footprint into couple geographic regions based upon the economic heterogeneity. However, scorecard models are still developed solely on the information of credit profile and payment history. In our view, this is a static approach without the consideration of economic dynamics.

During the course of 2008 recession, most lending organizations have observed performance decay in their scorecards as a result of the economic downturn. Three major remedies for scorecard deterioration had taken place with the incorporation of economic information. First of all, a quick patch considered by most banks was to tighten up credit policies by increasing scorecard cutoffs in stressed areas based on their economic conditions, e.g. unemployment rates and housing price changes. Secondly, hybrid risk models were developed in many risk workshops by combining existing scorecards with economic data in order to restore the rank order capability and expected odds ratio. Thirdly, with the most recent credit profiles and payment behaviors of customers during the downturn, many scorecards were re-developed so as to guard against the rapid increases in non-performing loans and credit losses. Whilst approaches described above are able to address scorecard performance issues in a timely manner, they are still considered short-term treatments after the occurrence of recession and often tend to over-correct the underlying problems at the cost of missing revenue opportunities. Consequently, these approaches are inevitably subject to future adjustments or replacements in the economic recovery.

In light of the above discussion, a question raised is how we can effectively take advantage of economic information to improve the predictability and stability of scorecards through the economic cycle. Based upon our experience, a sensible resolution is to overlay economic data on top of the traditional scorecard development procedure by directly using economic indicators as model predictors. While the most predictability of a scorecard would still come from individual-level credit characteristics, economic indicators are able to complement credit attributes and to provide additional predictive lift to justify their values. For instance, given two individuals with identical risk profiles but living two MSAs with different economic outlooks, it is obvious that the one in a stressed local economy is more likely to become delinquent. Despite the predictability, a scorecard developed with a built-in economic trend is more robust and able to fluctuate automatically along with the cycle, reducing the necessity of scorecard adjustments in the dynamic economy. In addition, the inclusion of leading indicators or economic projections in the scorecard enables us to have a forward-looking prediction capability, which provides us an opportunity to proactively employ early interventions and preventions in loss recovery and mitigation initiatives.

Written by statcompute

March 24, 2012 at 10:55 am

Posted in Loss Forecasting

A Supplement to “Modeling Rates and Proportions in SAS – 4”

Below is a demonstration on how to perform white test and Breusch-Pagan test for heteroscedasticity and to correct for heteroscedasticity with GMM estimation using MODEL PROCEDURE in SAS/ETS.

Written by statcompute

March 10, 2012 at 11:33 pm

Posted in SAS, Statistical Models

Modeling Practices of Loss Forecasting for Consumer Banking Portfolio

Roll Rate Models

The roll rate model is the most commonly used modeling practice for the loss forecasting in the consumer banking arena built at the portfolio level instead of at the individual account level.

In this modeling practice, the whole portfolio is segmented by various delinquency buckets, e.g. Current, 30-DPD, 60-DPD, 90-DPD, 120-DPD, 150-DPD, and charge-off. The purpose is to evaluate the probability of an account in a specific delinquency bucket flowing into the next stage of delinquency status during the course of 1 month. The table below demonstrates the scheme of a basic roll rate model. In the table below, projected rolling rates are shown in the bottom two rows highlighted in red. The projected rate for each delinquency bucket is simply the moving average of previous 3 months.

Due to its nature of simplicity, the roll rate model is able to fit into various business scenarios, e.g. delinquency tracking and loss forecasting, and across different consumer products, e.g. installment loans and revolving credits. However, since the rolling rate for a specific delinquency bucket is estimated on the moving-average basis without being able to incorporate exogenous risk factors and economic drivers, a roll rate model is most applicable to the short-term loss forecasting, e.g. usually within 3 months, and also not able to estimate the portfolio loss in a stressed economic scenario.

Vintage Loss Models

A vintage loss model is another widely used modeling technique for the loss forecasting and is similar to the roll rate model in that they both are portfolio-based instead of account-based. However, in the vintage loss model, the whole portfolio is segmented by various origination vintages instead of delinquency buckets.

In this modeling practice, once the vintage criterion is determined, each delinquency performance, e.g. from 30-DPD through Charge-off, of every segment can be tracked over time through the full life cycle. For instance, in the case of loss estimation, the loss rate of a specific vintage can be formulated as

Ln(Loss_Rate / (1 – Loss_Rate)) = A * Vintage_Quality + B * Economy_Driver + S(Maturation)

In the model specification above, while vintage_quality, e.g. origination credit score or loan-to-value, and economy_driver, e.g. unemployment rate or housing price index, are linear components, Maturation, e.g. months on book, is a non-parametric term to reflect the nonlinearity of a maturity curve.

Compared with the roll rate model, the vintage loss model demonstrates a twofold benefit. First of all, with the inclusion of maturation information, the loss trend can be captured and utilized to improve the forecasting in a longer term. Secondly, incorporated with the economic information, the model can also be used to perform the stress testing in various economic scenarios. However, a caveat is that the vintage loss model is more suitable for installment loans than for revolving credits in loss forecasting due to impacts of various business practices such as balance transfers and teaser rates.

Expected Losses Models

Albeit easy to understand and simple to implement, two methods introduced above are all under the criticism of not being able to incorporate loan-specific characteristics and a finer level of economic information. Significantly different from roll rate models and vintage loss curves, expected loss (EL) estimation is a modern modeling practice developed on the basis of 3 risk parameters, namely probability of default (PD), exposure at default (EAD), and loss given default (LGD).

In this modeling practice, each risk parameter is modeled separately with account-level risk factors and economic indicators. For each individual account, the expected loss during the course of next 12 months can be formulated as


This modeling methodology not only is in line with Basel framework but also has following advantages over traditional methods for loss forecasting, including roll rate models and vintage loss models.

1. The account-level modeling practice provides a more granular risk profiling for each individual borrower.
2. Each risk parameter is driven by a separate set of economic factors independently, allowing a more dynamic view of economic impacts
3. The modeling methodology is in line with statistical techniques prevailing in the consumer lending arena and intuitively adoptable by most model developers.

Since this methodology heavily relies on statistical modeling concepts, the loss estimation is subject to specific statistical assumptions on distributions and functional forms.

Written by statcompute

March 6, 2012 at 11:11 pm

Modeling Rates and Proportions in SAS – 4


NLS (Nonlinear Least Square) regression is a similar modeling technique to OLS regression described the previous section aiming to model rates and proportions in (0, 1) interval. With NLS specification, it is assumed

Y = 1 / (1 + EXP(-X`B)) + e, where e ~ N(0, sigma ^ 2)

Therefore, Y is assumed normally distributed with N(1 / (1 + exp(-X`B)), sigma ^ 2). Because NLS regression is able to directly model the conditional mean of Y instead of the conditional mean of Ln(Y / (1 – Y)), an obvious advantage is that it doesn’t impose restricted distributional assumptions, e.g. Y with an additive logistic normal distribution, that are vital to recover E(Y | X). However, also due to the assumption of Y ~ N(1 / (1 + exp(-X`B)), sigma ^2), NLS regression is inevitably subject to the criticism of failing to address the concern related to unequal variance, e.g. heteroscedasticity, often presented in rate and proportion outcomes.

In SAS, a generic implementation of NLS regression can be conducted by NLIN procedure. However, due to the nonlinear nature, the estimation of NLS regression might suffer from the convergence problem, e.g. long computing time or failure to converge. In this case, it is a good strategy to use estimated coefficients from OLS described in the prior section as the starting points for coefficients to be estimated in NLS.

Again, for the demonstration purpose, both NLIN and NLMIXED procedures in SAS are used to estimate NLS in our study. It is found that the convergence of NLS estimation in PROC NLIN is very effective and fast. An interesting observation after comparing OLS and NLS estimates is that estimated coefficients and t-statistics for significant variables from both models are close enough to each other in terms of the direction and the magnitude, which might be viewed as the heuristic evidence to justify our modeling strategy using OLS coefficients as the starting points for coefficients to be estimated in NLS.

Written by statcompute

March 3, 2012 at 5:00 pm

Posted in SAS, Statistical Models