Yet Another Blog in Statistical Computing

I can calculate the motion of heavenly bodies but not the madness of people. -Isaac Newton

Posts Tagged ‘Generalized Poisson

Modeling Generalized Poisson Regression with NLMIXED Procedure

The Generalized Poisson (GP) regression is a very flexible statistical model for count outcomes in that it can accommodate both over-dispersion and under-dispersion, which makes it a very practical modeling approach in real-world problems and is considered a serious contender for the Quasi-Poisson regression.

Prob(Y) = Alpha / Y! * (Alpha + Xi * Y) ^ (Y – 1) * EXP(-Alpha – Xi * Y)
E(Y) = Mu = Alpha / (1 – Xi)
Var(Y) = Mu / (1 – Xi) ^ 2

While the GP regression can be conveniently estimated with HMM procedure in SAS, I’d always like to dive a little deeper into its model specification and likelihood function to have a better understanding. For instance, there is a slight difference in GP model outcomes between HMM procedure in SAS and VGAM package in R. After looking into the detail, I then realized that the difference is solely due to the different parameterization.

Basically, there are three steps for estimating a GP regression with NLMIXED procedure. Due to the complexity of GP likelihood function and its convergence process, it is always a good practice to estimate a baseline Standard Poisson regression as a starting point and then to output its parameter estimates into a table, e.g. _EST as shown below.

ods output ParameterEstimates = _est;
proc genmod data = mylib.credit_count;
  model majordrg = age acadmos minordrg ownrent / dist = poisson link = log;
run;

After acquiring parameter estimates from a Standard Poisson regression, we can use them to construct initiate values of parameter estimates for the Generalized Poisson regression. In the code snippet below, we used SQL procedure to create 2 macro variables that we are going to use in the final model estimation of GP regression.

proc sql noprint;
select
  "_"||compress(upcase(parameter), ' ')||" = "||compress(put(estimate, 10.2), ' ')
into
  :_parm separated by ' '
from  
  _est;
  
select
  case 
    when upcase(parameter) = 'INTERCEPT' then "_"||compress(upcase(parameter), ' ')
    else "_"||compress(upcase(parameter), ' ')||" * "||compress(upcase(parameter), ' ')
  end
into
  :_xb separated by ' + '    
from  
  _est
where
  upcase(parameter) ~= 'SCALE';  
quit;

/*
%put &_parm;
_INTERCEPT = -1.38 _AGE = 0.01 _ACADMOS = 0.00 _MINORDRG = 0.46 _OWNRENT = -0.20 _SCALE = 1.00

%put &_xb;
 _INTERCEPT + _AGE * AGE + _ACADMOS * ACADMOS + _MINORDRG * MINORDRG + _OWNRENT * OWNRENT
*/

In the last step, we used the NLMIXED procedure to estimate the GP regression by specifying its log likelihood function that would generate identical model results as with HMM procedure. It is worth mentioning that the expected mean _mu = exp(x * beta) in SAS and the term exp(x * beta) refers to the _alpha parameter in R. Therefore, the intercept would be different between SAS and R, primarily due to different ways of parameterization with the identical statistical logic.

proc nlmixed data = mylib.credit_count;
  parms &_parm.;
  _xb = &_xb.;
  _xi = 1 - exp(-_scale);
  _mu = exp(_xb);  
  _alpha = _mu * (1 - _xi);
  _prob = _alpha / fact(majordrg) * (_alpha + _xi * majordrg) ** (majordrg - 1) * exp(- _alpha - _xi * majordrg);
  ll = log(_prob);
  model majordrg ~ general(ll);
run;

In addition to HMM and NLMIXED procedures, GLIMMIX can also be employed to estimate the GP regression, as shown below. In this case, we need to specify both the log likelihood function and the variance function in terms of the expected mean.

proc glimmix data = mylib.credit_count;
  model majordrg = age acadmos minordrg ownrent / link = log solution;
  _xi = 1 - 1 / exp(_phi_);
  _variance_ = _mu_ / (1 - _xi) ** 2;
  _alpha = _mu_ * (1 - _xi);
  _prob = _alpha / fact(majordrg) * (_alpha + _xi * majordrg) ** (majordrg - 1) * exp(- _alpha - _xi * majordrg);  
  _logl_ = log(_prob);
run;

Written by statcompute

March 11, 2017 at 3:01 pm