Yet Another Blog in Statistical Computing

I can calculate the motion of heavenly bodies but not the madness of people. -Isaac Newton

Double Poisson Regression in SAS

In the previous post (https://statcompute.wordpress.com/2016/11/27/more-about-flexible-frequency-models), I’ve shown how to estimate the double Poisson (DP) regression in R with the gamlss package. The hurdle of estimating DP regression is the calculation of a normalizing constant in the DP density function, which can be calculated either by the sum of an infinite series or by a closed form approximation. In the example below, I will show how to estimate DP regression in SAS with the GLIMMIX procedure.

First of all, I will show how to estimate DP regression by using the exact DP density function. In this case, we will approximate the normalizing constant by computing a partial sum of the infinite series, as highlighted below.

data poi;
  do n = 1 to 5000;
    x1 = ranuni(1);
    x2 = ranuni(2);
    x3 = ranuni(3);
    y = ranpoi(4, exp(1 * x1 - 2 * x2 + 3 * x3));
    output;
  end;
run;

proc glimmix data = poi;
  nloptions tech = quanew update = bfgs maxiter = 1000;
  model y = x1 x2 x3 / link = log solution;
  theta = exp(_phi_);
  _variance_ = _mu_ / theta;
  p_u = (exp(-_mu_) * (_mu_ ** y) / fact(y)) ** theta;
  p_y = (exp(-y) * (y ** y) / fact(y)) ** (1 - theta);
  f = (theta ** 0.5) * ((exp(-_mu_)) ** theta);  
  do i = 1 to 100;
    f = f + (theta ** 0.5) * ((exp(-i) * (i ** i) / fact(i)) ** (1 - theta)) * ((exp(-_mu_) * (_mu_ ** i) / fact(i)) ** theta);
  end;
  k = 1 / f;
  prob = k * (theta ** 0.5) * p_y * p_u;
  if log(prob) ~= . then _logl_ = log(prob);
run;

Next, I will show the same estimation routine by using the closed form approximation.

proc glimmix data = poi;
  nloptions tech = quanew update = bfgs maxiter = 1000;
  model y = x1 x2 x3 / link = log solution;
  theta = exp(_phi_);
  _variance_ = _mu_ / theta;
  p_u = (exp(-_mu_) * (_mu_ ** y) / fact(y)) ** theta;
  p_y = (exp(-y) * (y ** y) / fact(y)) ** (1 - theta);
  k = 1 / (1 + (1 - theta) / (12 * theta * _mu_) * (1 + 1 / (theta * _mu_)));
  prob = k * (theta ** 0.5) * p_y * p_u;
  if log(prob) ~= . then _logl_ = log(prob);
run;

While the first approach is more accurate by closely following the DP density function, the second approach is more efficient with a significantly lower computing cost. However, both are much faster than the corresponding R function gamlss().

Advertisements

Written by statcompute

April 20, 2017 at 12:48 am

%d bloggers like this: