Modeling Rates and Proportions in SAS – 5


Tobit model can be considered a special case of naïve OLS regression with the dependent variable censored and therefore observable only in a certain interval, which is (0, 1) for rates and proportions in my study. Specifically, this class of models assumes that there is a latent variable Y* such that

Y* = X`B + e, where e ~ N(0, sigma ^ 2) & Y = Min(1, Max(0, Y*))

As a result, the representation of rates and proportions, Y, bounded by (0, 1) can be considered the observable part of a normally distributed variable Y* ~ N(X`B, sigma ^ 2) on the real line. However, a fundamental argument against this censoring assumption is that the reason that values outside the boundary of [0, 1] are not observable is not because they are censored but because they are not defined. Hence, the censored normal distribution might not be an appropriate assumption for percentages and proportions.

In SAS, the most convenient way to model Tobit model is QLIM procedure in SAS / ETS module. However, in order to clearly illustrate the log likelihood function of Tobit model, we’d like stick to NLMIXED procedure and estimate the Tobit model with maximum likelihood estimation. The maximum likelihood estimator for a Tobit model assumes that the errors are normal and homoscedastic and would be otherwise inconsistent. In the previous section, it is shown that the heteroscedasticity presents due to the nature of rate and proportion outcomes. As a result, the simultaneous estimation of a variance model is also necessary to account for heteroscedasticity by the function VAR(e) = sigma ^ 2 * (1 + EXP(Z`G)). Therefore, there are 2 components in our Tobit model specification, a mean sub-model and a variance sub-model. As illustrated in the output below, a couple independent variables, e.g. X1 and X2, are significant in both models, showing that the conditional variance is not independent of the mean in proportion outcomes.