Yet Another Blog in Statistical Computing

I can calculate the motion of heavenly bodies but not the madness of people. -Isaac Newton

Archive for May 2012

Information Criteria and Vuong Test

When it comes to model selection between two non-nest models, the information criteria, e.g. AIC or BIC, is often used and the model with a lower information criteria is preferred.

However, even with AIC or BIC, we are still unable to answer the question of whether the model A is significantly better than the model B probabilistically. Proposed by Quang Vuong (1989), Vuong test considers a better model with the individual log likelihoods significantly higher than the ones of its rival. A demonstration of Vuong test is given below.

First of all, two models for proportional outcomes, namely TOBIT regression and NLS (Nonlinear Least Squares) regression, are estimated below with information criteria, e.g. AIC and BIC, calculated and the likelihood of each individual record computed. As shown in the following output, NLS regression has a lower BIC and therefore might be considered a “better” model.

Next, with the likelihood of each individual record from both models, Vuong test is calculated with the formulation given below

Vuong statistic = [LR(model1, model2) – C] / sqrt(N * V) ~ N(0, 1)

LR(…) is the summation of individual log likelihood ratio between 2 models. “C” is a correction term for the difference of DF (Degrees of Freedom) between 2 models. “N” is the number of records. “V” is the variance of individual log likelihood ratio between 2 models. Vuong demonstrated that Vuong statistic is distributed as a standard normal N(0, 1). As a result, the model 1 is better with Vuong statistic > 1.96 and the model 2 is better with Vuong statistic < -1.96.

As shown in the output, although the model2, e.g. NLS regression, is preferred by a lower BIC, Vuong statistic doesn’t show the evidence that NLS regression is significantly better than TOBIT regression but indicates instead that both models are equally close to the true model.

Advertisements

Written by statcompute

May 26, 2012 at 11:48 pm

Decision Stump with the Implementation in SAS

A decision stump is a naively simple but effective rule-based supervised learning algorithm similar to CART (Classification & Regression Tree). However, the stump is a 1-level decision tree consisting of 2 terminal nodes.

Albeit simple, the decision stump has shown successful use cases in many aspects. For instance, as a weak classifier, the decision stump has been proven an excellent base learner in ensemble learning algorithms such as bagging and boosting. Moreover, a single decision stump can also be employed to do feature screening for predictive modeling and cut-point searching for continuous features. The following is an example showing the SAS implementation as well as the predictive power of a decision stump.

First of all, a testing data is simulated with 1 binary response variable Y and 3 continuous features X1 – X3, which X1 is the most related feature to Y with a single cut-point at 5, X2 is also related to Y but with 2 different cut-points at 1.5 and 7.5, and X3 is a pure noise.

The SAS macro below is showing how to program a single decision stump. And this macro would be used to search for the simulated cut-point in each continuous feature.

%macro stump(data = , w = , y = , xlist = );

%let i = 1;

%local i;

proc sql;
create table _out
  (
  variable   char(32),
  gt_value   num,
  gini       num
  );
quit;

%do %while (%scan(&xlist, &i) ne %str());  
  %let x = %scan(&xlist, &i);
  
  data _tmp1(keep = &w &y &x);
    set &data;
    where &y in (0, 1);
  run;

  proc sql;
    create table
      _tmp2 as
    select
      b.&x                                                          as gt_value,
      sum(case when a.&x <= b.&x then &w * &y else 0 end) / 
      sum(case when a.&x <= b.&x then &w else 0 end)                as p1_1,
      sum(case when a.&x >  b.&x then &w * &y else 0 end) / 
      sum(case when a.&x >  b.&x then &w else 0 end)                as p1_2,
      sum(case when a.&x <= b.&x then 1 else 0 end) / count(*)      as ppn1,
      sum(case when a.&x >  b.&x then 1 else 0 end) / count(*)      as ppn2,
      2 * calculated p1_1 * (1 - calculated p1_1) * calculated ppn1 + 
      2 * calculated p1_2 * (1 - calculated p1_2) * calculated ppn2 as gini
    from
      _tmp1 as a,
      (select distinct &x from _tmp1) as b
    group by
      b.&x;

    insert into _out
    select
      "&x",
      gt_value,
      gini
    from
      _tmp2
    having
      gini = min(gini);

    drop table _tmp1;
  quit;

  %let i = %eval(&i + 1); 
%end;

proc sort data = _out;
  by gini;
run;

proc report data = _out box spacing = 1 split = "*" nowd;
  column("DECISION STUMP SUMMARY"
         variable gt_value gini);
  define variable / "VARIABLE"                     width = 30 center;
  define gt_value / "CUTOFF VALUE*(GREATER THAN)"  width = 15 center;
  define gini     / "GINI"                         width = 10 center format = 9.4;
run;

%mend stump;

As shown in the table below, the decision stump did a fairly good job both in identifying predictive features and in locating cut-points. Both related features, X1 and X2, have been identified and correctly ranked in terms of associations with Y. For X1, the cut-point located is 4.97, extremely close to 5. For X2, the cut-point located is 7.46, close enough to 1 of 2 simulated cut-points.

Written by statcompute

May 19, 2012 at 12:36 am

Modeling Rates and Proportions in SAS – 8

7. FRACTIONAL LOGIT MODEL

Different from all models introduced previously that assume specific distributional families for the proportional outcomes of interests, the fractional logit model proposed by Papke and Wooldridge (1996) is a quasi-likelihood method that does not specify the full distribution but only requires the conditional mean to be correctly specified for consistent parameter estimates. Under the assumption E(Y|X) = G(X`B) = 1 / (1 + EXP(-X`B)), the fractional logit has the identical likelihood function to the one for a Bernoulli distribution such that

F(Y) = (G(X`B) ** Y) * (1 – G(X`B)) ** (1 – Y) with 1 >= Y >= 0

Based upon the above formulation, parameter estimates are calculated in the same manner as in the binary logistic regression by maximizing the log likelihood.

In SAS, the most convenient way to implement the fractional logit model is with GLIMMIX procedure. In addition, we can also use NLMIXED procedure by explicitly specifying the likelihood function as shown above.

Written by statcompute

May 15, 2012 at 10:12 pm

Posted in SAS, Statistical Models

Two Empirical Methods for Indirect Estimation of Proportional Outcomes through Logistic Regression

Written by statcompute

May 13, 2012 at 12:13 am

Posted in SAS, Statistical Models

Modeling Rates and Proportions in SAS – 7

6. SIMPLEX MODEL

Dispersion models proposed by Jorgensen (1997) can be considered a more general case of Generalized Linear Models by McCullagh and Nelder (1989) and include a dispersion parameter describing the distributional shape. The simplex model developed by Barndorff-Nielsen and Jorgensen (1991) is a special dispersion model and is useful to model proportional outcomes. A simplex model has the density function given by
    F(Y) = (2 * pi * sigma ^ 2 * (Y * (1 – Y)) ^ 3) ^ (-0.5) * EXP((-1 / (2 * sigma ^ 2)) * d(Y; Mu))
where d(Y; Mu) = (Y – Mu) ^ 2 / (Y * (1 – Y) * Mu ^ 2 * (1 – Mu) ^ 2) is a unit deviance function.

Similar to the Beta model, a simplex model also consists of 2 components. The first is a sub-model to describe the expected mean Mu. Since 0 < Mu < 1, the logit link function can be used to specify the relationship between the expected mean and covariates X such that LOG(Mu / (1 – Mu)) = X`B. The second is a sub-model to describe the pattern of dispersion parameter sigma ^ 2 also by a set of covariates Z such that LOG(sigma ^ 2) = Z`G. Due to the similar nature of parameterization between Beta model and Simplex model, model performances of these 2 often have been compared with each other. However, it is still an open question which model is able to outperform its competitor.

Similar to the case of Beta model, there is no out-of-box procedure in SAS estimating the simplex model. However, following its density function, we are able to model the simplex model with NLMIXED procedure as given below.

Written by statcompute

May 12, 2012 at 6:11 pm

Posted in SAS, Statistical Models