Dispersion Models

In the last week, I’ve read an interesting article “Dispersion Models in Regression Analysis” by Peter Song (http://www.pakjs.com/journals/25%284%29/25%284%299.pdf), which describes a new class of models more general than classic generalized linear models based on the error distribution.

A dispersion model can be defined by two parameters, a location parameter mu and a dispersion parameter sigma ^ 2, and has a very general form of probability function formulated as:
p(y, mu, sigma ^ 2) = {2 * pi * sigma ^ 2 * V(.)} ^ -0.5 * exp{-1 / (2 * sigma ^ 2) * D(.)}
where the variance function V(.) and the deviance function D(.) varies by distributions. For instance, in a poisson model,
D(.) = 2 * (y * log(y / mu) – y + mu)
V(.) = mu

Below is a piece of SAS code estimating a Poisson with both the error distribution assumption and the dispersion assumption.

data one;
  do i = 1 to 1000;
    x = ranuni(i);
    y = ranpoi(i, exp(2 + x * 2 + rannor(1) * 0.1));
    output;
  end;
run;

*** fit a poisson model with classic GLM ***;
proc nlmixed data = one tech = trureg;
  parms b0 = 0 b1 = 0;
  mu = exp(b0 + b1 * x);
  ll = -mu + y * log(mu) - log(fact(y));
  model y ~ general(ll);
run;
/*
             Fit Statistics
-2 Log Likelihood                 6118.0
AIC (smaller is better)           6122.0
AICC (smaller is better)          6122.0
BIC (smaller is better)           6131.8

                                             Parameter Estimates
                         Standard
Parameter    Estimate       Error      DF    t Value    Pr > |t|     Alpha       Lower       Upper    Gradient
b0             2.0024     0.01757    1000     113.95      <.0001      0.05      1.9679      2.0369    5.746E-9
b1             1.9883     0.02518    1000      78.96      <.0001      0.05      1.9388      2.0377    1.773E-9
*/

*** fit a poisson model with dispersion probability ***;
*** proposed by Jorgensen in 1987                   ***;
proc nlmixed data = one tech = trureg;
  parms b0 = 0 b1 = 0 s2 = 1;
  mu = exp(b0 + b1 * x);
  d  = 2 * (y * log(y / mu) - y + mu);
  v  = y;
  lh = (2 * constant('pi') * s2 * v) **  (-0.5) * exp(-(2 * s2) ** (-1) * d);
  ll = log(lh);
  model y ~ general(ll);
run;
/*
             Fit Statistics
-2 Log Likelihood                 6066.2
AIC (smaller is better)           6072.2
AICC (smaller is better)          6072.2
BIC (smaller is better)           6086.9

                                             Parameter Estimates
                         Standard
Parameter    Estimate       Error      DF    t Value    Pr > |t|     Alpha       Lower       Upper    Gradient
b0             2.0024     0.02015    1000      99.37      <.0001      0.05      1.9629      2.0420    2.675E-6
b1             1.9883     0.02888    1000      68.86      <.0001      0.05      1.9316      2.0449    1.903E-6
s2             1.3150     0.05881    1000      22.36      <.0001      0.05      1.1996      1.4304    -0.00002
*/

Please note that although both methods yield the same parameter estimates, there are slight differences in standard errors and therefore t-values. In addition, despite one more parameter estimated in the model, AIC / BIC are even lower in the dispersion model.