Yet Another Blog in Statistical Computing

I can calculate the motion of heavenly bodies but not the madness of people. -Isaac Newton

Prototyping Multinomial Logit with R

Recently, I am working on a new modeling proposal based on the competing risk and need to prototype multinomial logit models with R. There are 2 R packages implementing multinomial logit models that I’ve tested, namely nnet and vgam. Model outputs with iris data are shown below.

data(iris)

### method 1: nnet package ###
library(nnet)
mdl1 <- multinom(Species ~ Sepal.Length, data = iris, model = TRUE)
summary(mdl1)

# Coefficients:
#            (Intercept) Sepal.Length
# versicolor   -26.08339     4.816072
# virginica    -38.76786     6.847957
#
# Std. Errors:
#            (Intercept) Sepal.Length
# versicolor    4.889635    0.9069211
# virginica     5.691596    1.0223867

### method 2: vgam package ### 
library(VGAM)
mdl2 <- vglm(Species ~ Sepal.Length, data = iris, multinomial(refLevel = 1)) 
summary(mdl2)

# Coefficients:
#                Estimate Std. Error z value
# (Intercept):1  -26.0819    4.88924 -5.3346
# (Intercept):2  -38.7590    5.69064 -6.8110
# Sepal.Length:1   4.8157    0.90683  5.3105
# Sepal.Length:2   6.8464    1.02222  6.6976

However, in my view, above methods are not flexible for real-world problems. For instance, there is no off-shelf solution for the variable selection for above multinomial logit models. Instead of building one multinomial logit model, we might develop two separate binomial logit models to accomplish the same task.

### method 3: two binary logit models ### 
iris$y <- ifelse(iris$Species == 'setosa', 0, 1)
mdl31 <- glm(y ~ Sepal.Length, data = iris, subset = (Species != 'virginica'), family = binomial)
summary(mdl31)

#  Coefficients:
#              Estimate Std. Error z value Pr(>|z|)    
# (Intercept)   -27.831      5.434  -5.122 3.02e-07 ***
# Sepal.Length    5.140      1.007   5.107 3.28e-07 ***

mdl32 <- glm(y ~ Sepal.Length, data = iris, subset = (Species != 'versicolor'), family = binomial)
summary(mdl32)

# Coefficients:
#              Estimate Std. Error z value Pr(>|z|)    
# (Intercept)   -38.547      9.557  -4.033 5.50e-05 ***
# Sepal.Length    6.805      1.694   4.016 5.91e-05 ***

As shown above, we can get a set of similar estimated parameters by the third approach with much simpler models.

About these ads

Written by statcompute

August 21, 2013 at 11:22 pm

Follow

Get every new post delivered to your Inbox.

Join 80 other followers

%d bloggers like this: