Yet Another Blog in Statistical Computing

I can calculate the motion of heavenly bodies but not the madness of people. -Isaac Newton

Modified Park Test in SAS

The severity measure in operational loss models has an empirical distribution with positive values and a long tail to the far right. To estimate regression models for severity measures with such data characteristics, we can consider several candidate distributions, such as Lognormal, Gamma, inverse Gaussian, and so on. A statistical approach is called for to choose the appropriate estimator with a correct distributional assumption. The modified Park test is designed to fill the gap.

For any GLM model, a general relationship between the variance and the mean can be described as below:

var(y | x) = alpha * [E(y | x)] ^ lambda

  • With lambda = 0, it is suggested that the relationship between the variance and the mean is orthogonal. In this case, a Gaussian distributional assumption should be considered.
  • With lambda = 1, it is suggestion that the variance is proportional to the mean. In this case, a Poisson-like distribution assumption should be considered.
  • With lambda = 2, it is suggested that the variance is quadratic to the mean. In this case, a Gamma distributional assumption should be considered.
  • With lambda = 3, it is suggested that the variance is cubic to the mean. In this case, an Inverse Gaussian distributional assumption should be considered.

Without the loss of generality, the aforementioned logic can be further formulated as below given E(y | x) = yhat for an arbitrary estimator. As mentioned by Manning and Mullahy (2001), a Gamma estimator can be considered a natural baseline estimator.

var(y | x) = alpha * [E(y | x)] ^ lambda
–> (y – yhat) ^ 2 = alpha * [yhat] ^ lambda
–> log(y – yhat) ^ 2 = log(alpha) + lambda * log(yhat)

With the above formulation, there are two ways to construct the statistical test for lambda, which is the so-called “modified Park test”.

In the OLS regression setting, the log of squared residuals from the baseline estimator can be regression on a constant and the log of predicted values from the baseline estimator, e.g. a Gamma regression.

proc reg data = data;
  model ln_r2 = ln_yhat;
  park_test: test ln_yhat = 2;
run;

In the demonstrated example, we want to test the null hypothesis if the coefficient of ln_yhat is statistically different from 2, which suggests a Gamma distributional assumption.

Alternatively, in the GLM setting, the squared residuals from the baseline estimator can be regressed on a constant and the log of predicted values from the baseline estimator. In this specific GLM, the Gamma distribution and the log() link function should be employed.

proc nlmixed data = data;
  parms b0 = 1 b2 = 2 scale = 10;
  mu = exp(b0 + b1 * x);
  b = mu / scale;
  model r2 ~ gamma(scale, b);
  contrast 'park test' b1 - 2;
run;

Similarly, if the null hypothesis that the coefficient of ln_yhat minus 2 is not statistically different from 0 cannot be rejected, then the Gamma distributional assumption is valid based on the modified Park test.

Advertisements

Written by statcompute

November 20, 2016 at 7:01 pm

%d bloggers like this: