## Archive for **March 2017**

## Estimating Conway-Maxwell-Poisson Regression in SAS

Conway-Maxwell-Poisson (CMP) regression is a flexible way to model frequency outcomes with both under-dispersion and over-dispersion. In SAS, CMP regression can be estimated with COUNTREG procedure directly or with NLMIXED procedure by specifying the likelihood function. However, the use of NLMIXED procedure is extremely cumbersome in that we need to estimate a standard Poisson regression and then use estimated parameters as initial values parameter estimates for the CMP regression.

In the example below, we will show how to employ GLIMMIX procedure to estimate a CMP regression by providing both the log likelihood function and the variance function in terms of the expected mean.

proc glimmix data = mylib.credit_count; model majordrg = age acadmos minordrg ownrent / link = log solution; _nu = 1 / exp(_phi_); _variance_ = (1 / _nu) / ((_mu_) ** (1 / _nu)); _z = 0; do i = 0 to 100; _z = _z + (_mu_ ** i) / fact(i) ** _nu; end; _prob = (_mu_ ** majordrg) / (fact(majordrg) ** _nu) * (_z ** (-1)); _logl_ = log(_prob); run;

Since the scale parameter **_phi_** is strictly above 0, the function **1 / exp(_phi_)** in the line #3 is to ensure the **Nu** parameter bounded between 0 and 1.

In addition, the DO loop is to calculate the normalization constant **Z** such that the PMF would sum up to 1. As there is no closed form for the calculation of **Z**, we need to calculate it numerically at the cost of a longer computing time.

Other implicit advantages of GLIMMIX procedure over NLMIXED procedure include the unnecessity to provide initiate values of parameter estimates and a shorter computing time.

## Modeling Generalized Poisson Regression in SAS

The Generalized Poisson (GP) regression is a very flexible statistical model for count outcomes in that it can accommodate both over-dispersion and under-dispersion, which makes it a very practical modeling approach in real-world problems and is considered a serious contender for the Quasi-Poisson regression.

Prob(Y) = Alpha / Y! * (Alpha + Xi * Y) ^ (Y – 1) * EXP(-Alpha – Xi * Y)

E(Y) = Mu = Alpha / (1 – Xi)

Var(Y) = Mu / (1 – Xi) ^ 2

While the GP regression can be conveniently estimated with HMM procedure in SAS, I’d always like to dive a little deeper into its model specification and likelihood function to have a better understanding. For instance, there is a slight difference in GP model outcomes between HMM procedure in SAS and VGAM package in R. After looking into the detail, I then realized that the difference is solely due to the different parameterization.

Basically, there are three steps for estimating a GP regression with NLMIXED procedure. Due to the complexity of GP likelihood function and its convergence process, it is always a good practice to estimate a baseline Standard Poisson regression as a starting point and then to output its parameter estimates into a table, e.g. _EST as shown below.

ods output ParameterEstimates = _est; proc genmod data = mylib.credit_count; model majordrg = age acadmos minordrg ownrent / dist = poisson link = log; run;

After acquiring parameter estimates from a Standard Poisson regression, we can use them to construct initiate values of parameter estimates for the Generalized Poisson regression. In the code snippet below, we used SQL procedure to create 2 macro variables that we are going to use in the final model estimation of GP regression.

proc sql noprint; select "_"||compress(upcase(parameter), ' ')||" = "||compress(put(estimate, 10.2), ' ') into :_parm separated by ' ' from _est; select case when upcase(parameter) = 'INTERCEPT' then "_"||compress(upcase(parameter), ' ') else "_"||compress(upcase(parameter), ' ')||" * "||compress(upcase(parameter), ' ') end into :_xb separated by ' + ' from _est where upcase(parameter) ~= 'SCALE'; quit; /* %put &_parm; _INTERCEPT = -1.38 _AGE = 0.01 _ACADMOS = 0.00 _MINORDRG = 0.46 _OWNRENT = -0.20 _SCALE = 1.00 %put &_xb; _INTERCEPT + _AGE * AGE + _ACADMOS * ACADMOS + _MINORDRG * MINORDRG + _OWNRENT * OWNRENT */

In the last step, we used the NLMIXED procedure to estimate the GP regression by specifying its log likelihood function that would generate identical model results as with HMM procedure. It is worth mentioning that the expected mean _mu = exp(x * beta) in SAS and the term exp(x * beta) refers to the _alpha parameter in R. Therefore, the intercept would be different between SAS and R, primarily due to different ways of parameterization with the identical statistical logic.

proc nlmixed data = mylib.credit_count; parms &_parm.; _xb = &_xb.; _xi = 1 - exp(-_scale); _mu = exp(_xb); _alpha = _mu * (1 - _xi); _prob = _alpha / fact(majordrg) * (_alpha + _xi * majordrg) ** (majordrg - 1) * exp(- _alpha - _xi * majordrg); ll = log(_prob); model majordrg ~ general(ll); run;

In addition to HMM and NLMIXED procedures, GLIMMIX can also be employed to estimate the GP regression, as shown below. In this case, we need to specify both the log likelihood function and the variance function in terms of the expected mean.

proc glimmix data = mylib.credit_count; model majordrg = age acadmos minordrg ownrent / link = log solution; _xi = 1 - 1 / exp(_phi_); _variance_ = _mu_ / (1 - _xi) ** 2; _alpha = _mu_ * (1 - _xi); _prob = _alpha / fact(majordrg) * (_alpha + _xi * majordrg) ** (majordrg - 1) * exp(- _alpha - _xi * majordrg); _logl_ = log(_prob); run;