While Poisson regression is often used as a baseline model for count data, its assumption of equi-dispersion is too restrictive for many empirical applications. In practice, the variance of observed count data usually exceeds the mean, namely over-dispersion, due to the unobserved heterogeneity and/or excess zeroes. With the similar consequences of heteroskedasticity in the linear regression, over-dispersion in a Poisson regression will lead to deflated standard errors of parameter estimates and therefore inflated t-statistics. After the development of Poisson regression, it is always a sound practice to do an additional analysis for over-dispersion.

Below is a SAS macro to test the over-dispersion based upon the Lagrange Multiplier (LM) Test introduced by William Greene (2002) in his famous “Econometric Analysis”. The statistic follows the chi-square distribution with 1 degree freedom. The null hypothesis implies equi-dispersion in outcomes from the tested Poisson model.

%macro lm(data = , y = , pred_y = ); ***************************************************; * This macro is to test the over-dispersion based *; * on outcomes from a poisson model *; * -- wensui.liu@53.com *; ***************************************************; * parameters: *; * data : the input dataset *; * y : observed count outcome *; * pred_y: predicted outcome from poisson model *; ***************************************************; * reference: *; * w. greene (2002), econometric analysis *; ***************************************************; proc iml; use &data; read all var {&y} into y; read all var {&pred_y} into lambda; close &data; e = (y - lambda); n = nrow(y); ybar = y`[, :]; LM = (e` * e - n * ybar) ** 2 / (2 * lambda` * lambda); Pvalue = 1 - probchi(LM, 1); title 'LM TEST FOR OVER-DISPERSION'; print LM Pvalue; title; quit; ***************************************************; * end of macro *; ***************************************************; %mend lm;

Next, a use case of the aforementioned LM test is demonstrated. First of all, a vector of Poisson outcomes are simulated with 10% excessive zeros and therefore over-dispersion.

*** SIMULATE A POISSON VECTOR WITH EXCESSIVE ZEROS ***; data one; do i = 1 to 1000; x = ranuni(i); if i <= 900 then y = ranpoi(i, exp(x * 2)); else y = 0; output; end; run;

A Poisson regression is estimated with the simulated count outcomes including excessive zeros. After the calculation of predicted values, LM test is used to test the over-dispersion. As shown below, the null hypothesis of equi-dispersion is rejected with LM-stat = 31.18.

*** TEST DISPERSION WITH EXCESSIVE ZEROS ***; ods listing close; proc genmod data = one; model y = x / dist = poisson; output out = out1 p = predicted; run; ods listing; %lm(data = out1, y = y, pred_y = predicted); /* LM TEST FOR OVER-DISPERSION LM PVALUE 31.182978 2.3482E-8 */

Another Poisson regression is also estimated with simulated count outcomes excluding 10% excessive zeros. As expected, with outcomes from this newly estimated Poisson model, the null hypothesis of equi-dispersion is not rejected.

*** TEST DISPERSION WITHOUT EXCESSIVE ZEROS ***; ods listing close; proc genmod data = one; where i <= 900; model y = x / dist = poisson; output out = out2 p = predicted; run; ods listing; %lm(data = out2, y = y, pred_y = predicted); /* LM TEST FOR OVER-DISPERSION LM PVALUE 0.052131 0.8193959 */