Modeling Rates and Proportions in SAS – 4


NLS (Nonlinear Least Square) regression is a similar modeling technique to OLS regression described the previous section aiming to model rates and proportions in (0, 1) interval. With NLS specification, it is assumed

Y = 1 / (1 + EXP(-X`B)) + e, where e ~ N(0, sigma ^ 2)

Therefore, Y is assumed normally distributed with N(1 / (1 + exp(-X`B)), sigma ^ 2). Because NLS regression is able to directly model the conditional mean of Y instead of the conditional mean of Ln(Y / (1 – Y)), an obvious advantage is that it doesn’t impose restricted distributional assumptions, e.g. Y with an additive logistic normal distribution, that are vital to recover E(Y | X). However, also due to the assumption of Y ~ N(1 / (1 + exp(-X`B)), sigma ^2), NLS regression is inevitably subject to the criticism of failing to address the concern related to unequal variance, e.g. heteroscedasticity, often presented in rate and proportion outcomes.

In SAS, a generic implementation of NLS regression can be conducted by NLIN procedure. However, due to the nonlinear nature, the estimation of NLS regression might suffer from the convergence problem, e.g. long computing time or failure to converge. In this case, it is a good strategy to use estimated coefficients from OLS described in the prior section as the starting points for coefficients to be estimated in NLS.

Again, for the demonstration purpose, both NLIN and NLMIXED procedures in SAS are used to estimate NLS in our study. It is found that the convergence of NLS estimation in PROC NLIN is very effective and fast. An interesting observation after comparing OLS and NLS estimates is that estimated coefficients and t-statistics for significant variables from both models are close enough to each other in terms of the direction and the magnitude, which might be viewed as the heuristic evidence to justify our modeling strategy using OLS coefficients as the starting points for coefficients to be estimated in NLS.