Yet Another Blog in Statistical Computing

I can calculate the motion of heavenly bodies but not the madness of people. -Isaac Newton

Estimate Regression with (Type-I) Pareto Response

The Type-I Pareto distribution has a probability function shown as below

f(y; a, k) = k * (a ^ k) / (y ^ (k + 1))

In the formulation, the scale parameter 0 < a < y and the shape parameter k > 1 .

The positive lower bound of Type-I Pareto distribution is particularly appealing in modeling the severity measure in that there is usually a reporting threshold for operational loss events. For instance, the reporting threshold of ABA operational risk consortium data is $10,000 and any loss event below the threshold value would be not reported, which might add the complexity in the severity model estimation.

In practice, instead of modeling the severity measure directly, we might model the shifted response y` = severity – threshold to accommodate the threshold value such that the supporting domain of y` could start from 0 and that the Gamma, Inverse Gaussian, or Lognormal regression can still be applicable. However, under the distributional assumption of Type-I Pareto with a known lower end, we do not need to shift the severity measure anymore but model it directly based on the probability function.

Below is the R code snippet showing how to estimate a regression model for the Pareto response with the lower bound a = 2 by using the VGAM package.

library(VGAM)
set.seed(2017)
n <- 200
a <- 2
x <- runif(n)
k <- exp(1 + 5 * x)
pdata <- data.frame(y = rpareto(n = n, scale = a, shape = k), x = x)
fit <- vglm(y ~ x, paretoff(scale = a), data = pdata, trace = TRUE)
summary(fit)
# Coefficients:
#             Estimate Std. Error z value Pr(>|z|)
# (Intercept)   1.0322     0.1363   7.574 3.61e-14 ***
# x             4.9815     0.2463  20.229  < 2e-16 ***
AIC(fit)
#  -644.458
BIC(fit)
#  -637.8614

The SAS code below estimating the Type-I Pareto regression provides almost identical model estimation.

proc nlmixed data = pdata;
  parms b0 = 0.1 b1 = 0.1;
  k = exp(b0 + b1 * x);
  a = 2;
  lh = k * (a ** k) / (y ** (k + 1));
  ll = log(lh);
  model y ~ general(ll);
run;
/*
Fit Statistics
-2 Log Likelihood               -648.5
AIC (smaller is better)         -644.5
AICC (smaller is better)        -644.4
BIC (smaller is better)         -637.9

Parameter Estimate   Standard   DF    t Value   Pr > |t|
                     Error 
b0        1.0322     0.1385     200    7.45     <.0001 	
b1        4.9815     0.2518     200   19.78     <.0001 	
*/

At last, it is worth pointing out that the conditional mean of Type-I Pareto response is not equal to exp(x * beta) but a * k / (k – 1) with k = exp(x * beta) . Therefore, the conditional mean only exists when k > 1 , which might cause numerical issues in the model estimation.

Advertisements

Written by statcompute

December 11, 2016 at 5:12 pm

%d bloggers like this: