Yet Another Blog in Statistical Computing

I can calculate the motion of heavenly bodies but not the madness of people. -Isaac Newton

Archive for March 2013

Passing Many Parameter Values into A Macro Iteratively

In a production environment, we often need to call a SAS macro multiple time and then pass many parameter values into the macro iteratively. For instance, we might have a dummy macro with only one parameter as below.

%macro silly(parm = );
  %put ==> &parm;
%mend silly;

There might be a situation that we need to call the above macro hundreds of times, each of which a different value should be passed to the macro parameter. For example, we might need to pass each element of 1000 values generated below into the above dummy macro.

data list;
  do i = 0 to 999;
    parm = "var"||put(i, z3.);
    output;
  end;
run;

The most standard way to accomplish the aforementioned task is to parse each element one by one from a long string holding all values and then to loop through the macro 1,000 times as shown below. However, this approach is not only cumbersome to code but also computationally expensive.

*** method 1 ***;
%macro loop;

proc sql noprint;
  select parm into: parm separated by ' ' from list;
quit;

%let i = 1;
%do %while (%scan(&parm, &i) ne %str());
  %let var = %scan(&parm, &i);
  *** loop through each value on the list ***;
  %silly(parm = &var);
  %let i = %eval(&i + 1);
%end;

%mend loop;

%loop;

A sleeker way is to take advantage of the internal iteration scheme in SAS data step and to call the macro iteratively by EXECUTE() routine, as demonstrated below. The code snippet is short and simple. More importantly, the run time of this new approach is approximately 4 – 5 times shorter than the run time used in a standard method.

*** method 2 ***;
data _null_;
  set list;
  call execute('%silly(parm = '||parm||')');
run;

Written by statcompute

March 17, 2013 at 11:07 am

Posted in SAS

Tagged with ,

Lagrange Multiplier (LM) Test for Over-Dispersion

While Poisson regression is often used as a baseline model for count data, its assumption of equi-dispersion is too restrictive for many empirical applications. In practice, the variance of observed count data usually exceeds the mean, namely over-dispersion, due to the unobserved heterogeneity and/or excess zeroes. With the similar consequences of heteroskedasticity in the linear regression, over-dispersion in a Poisson regression will lead to deflated standard errors of parameter estimates and therefore inflated t-statistics. After the development of Poisson regression, it is always a sound practice to do an additional analysis for over-dispersion.

Below is a SAS macro to test the over-dispersion based upon the Lagrange Multiplier (LM) Test introduced by William Greene (2002) in his famous “Econometric Analysis”. The statistic follows the chi-square distribution with 1 degree freedom. The null hypothesis implies equi-dispersion in outcomes from the tested Poisson model.

%macro lm(data = , y = , pred_y = );
***************************************************;
* This macro is to test the over-dispersion based *;
* on outcomes from a poisson model                *;
*                            -- wensui.liu@53.com *;
***************************************************;
* parameters:                                     *;
*  data  : the input dataset                      *;
*  y     : observed count outcome                 *;
*  pred_y: predicted outcome from poisson model   *;
***************************************************;
* reference:                                      *;
*  w. greene (2002), econometric analysis         *;
***************************************************;

proc iml;
  use &data;
  read all var {&y} into y;
  read all var {&pred_y} into lambda;
  close &data;

  e = (y - lambda);
  n = nrow(y);
  ybar = y`[, :];
  LM = (e` * e - n * ybar) ** 2 / (2 * lambda` * lambda);
  Pvalue = 1 - probchi(LM, 1);
  title 'LM TEST FOR OVER-DISPERSION';
  print LM Pvalue;
  title;
quit;

***************************************************;
*                 end of macro                    *;
***************************************************;
%mend lm;

Next, a use case of the aforementioned LM test is demonstrated. First of all, a vector of Poisson outcomes are simulated with 10% excessive zeros and therefore over-dispersion.

*** SIMULATE A POISSON VECTOR WITH EXCESSIVE ZEROS ***;
data one;
  do i = 1 to 1000;
    x = ranuni(i);
    if i <= 900 then y = ranpoi(i, exp(x * 2));
    else y = 0;
    output;
  end;
run;

A Poisson regression is estimated with the simulated count outcomes including excessive zeros. After the calculation of predicted values, LM test is used to test the over-dispersion. As shown below, the null hypothesis of equi-dispersion is rejected with LM-stat = 31.18.

*** TEST DISPERSION WITH EXCESSIVE ZEROS ***;
ods listing close;
proc genmod data = one;
  model y =  x / dist = poisson;
  output out = out1 p = predicted;
run;
ods listing;

%lm(data = out1, y = y, pred_y = predicted);
/*
LM TEST FOR OVER-DISPERSION

       LM    PVALUE

31.182978 2.3482E-8
*/

Another Poisson regression is also estimated with simulated count outcomes excluding 10% excessive zeros. As expected, with outcomes from this newly estimated Poisson model, the null hypothesis of equi-dispersion is not rejected.

*** TEST DISPERSION WITHOUT EXCESSIVE ZEROS ***;
ods listing close;
proc genmod data = one;
  where i <= 900;
  model y =  x / dist = poisson;
  output out = out2 p = predicted;
run;
ods listing;

%lm(data = out2, y = y, pred_y = predicted);
/*
LM TEST FOR OVER-DISPERSION

       LM    PVALUE

 0.052131 0.8193959
*/

Written by statcompute

March 16, 2013 at 7:02 pm