LASSO – Yet Another Blog in Statistical Computing

LASSO has been a popular algorithm for the variable selection and extremely effective with high-dimension data. However, it often tends to “over-regularize” a model that might be overly compact and therefore under-predictive.

The Elastic Net addresses the aforementioned “over-regularization” by balancing between LASSO and ridge penalties. In particular, a hyper-parameter, namely Alpha, would be used to regularize the model such that the model would become a LASSO in case of Alpha = 1 and a ridge in case of Alpha = 0. In practice, Alpha can be tuned easily by the cross-validation. Below is a demonstration of Elastic Net with R glmnet package and its comparison with LASSO and ridge models.

pkgs <- list("glmnet", "doParallel", "foreach", "pROC")
lapply(pkgs, require, character.only = T)
registerDoParallel(cores = 4)
 
df1 <- read.csv("Downloads/credit_count.txt")
df2 <- df1[df1$CARDHLDR == 1, ]
set.seed(2017)
n <- nrow(df2)
sample <- sample(seq(n), size = n * 0.5, replace = FALSE)
train <- df2[sample, -1]
test <- df2[-sample, -1]
mdlY <- as.factor(as.matrix(train["DEFAULT"]))
mdlX <- as.matrix(train[setdiff(colnames(df1), c("CARDHLDR", "DEFAULT"))])
newY <- as.factor(as.matrix(test["DEFAULT"]))
newX <- as.matrix(test[setdiff(colnames(df1), c("CARDHLDR", "DEFAULT"))])

First of all, we estimates a LASSO model with Alpha = 1. The function cv.glmnet() is used to search for a regularization parameter, namely Lambda, that controls the penalty strength. As shown below, the model only identifies 2 attributes out of total 12.

# LASSO WITH ALPHA = 1
cv1 <- cv.glmnet(mdlX, mdlY, family = "binomial", nfold = 10, type.measure = "deviance", paralle = TRUE, alpha = 1)
md1 <- glmnet(mdlX, mdlY, family = "binomial", lambda = cv1$lambda.1se, alpha = 1)
coef(md1)
#(Intercept) -1.963030e+00
#AGE          .           
#ACADMOS      .           
#ADEPCNT      .           
#MAJORDRG     .           
#MINORDRG     .           
#OWNRENT      .           
#INCOME      -5.845981e-05
#SELFEMPL     .           
#INCPER       .           
#EXP_INC      .           
#SPENDING     .           
#LOGSPEND    -4.015902e-02
roc(newY, as.numeric(predict(md1, newX, type = "response")))
#Area under the curve: 0.636

We next estimates a ridge model as below by setting Alpha = 0. Similarly, Lambda is searched by the cross-validation. Since the ridge penalty would only regularize the magnitude of each coefficient, we end up with a “full” model with all model attributes. The model performance is slightly better with 10 more variables, which is a debatable outcome.

# RIDGE WITH ALPHA = 0
cv2 <- cv.glmnet(mdlX, mdlY, family = "binomial", nfold = 10, type.measure = "deviance", paralle = TRUE, alpha = 0)
md2 <- glmnet(mdlX, mdlY, family = "binomial", lambda = cv2$lambda.1se, alpha = 0)
coef(md2)
#(Intercept) -2.221016e+00
#AGE         -4.184422e-04
#ACADMOS     -3.085096e-05
#ADEPCNT      1.485114e-04
#MAJORDRG     6.684849e-03
#MINORDRG     1.006660e-03
#OWNRENT     -9.082750e-03
#INCOME      -6.960253e-06
#SELFEMPL     3.610381e-03
#INCPER      -3.881890e-07
#EXP_INC     -1.416971e-02
#SPENDING    -1.638184e-05
#LOGSPEND    -6.213884e-03
roc(newY, as.numeric(predict(md2, newX, type = "response")))
#Area under the curve: 0.6435

At last, we use the Elastic Net by tuning the value of Alpha through a line search with the parallelism. In this particular case, Alpha = 0.3 is chosen through the cross-validation. As shown below, 6 variables are used in the model that even performs better than the ridge model with all 12 attributes.

# ELASTIC NET WITH 0 < ALPHA < 1
a <- seq(0.1, 0.9, 0.05)
search <- foreach(i = a, .combine = rbind) %dopar% {
  cv <- cv.glmnet(mdlX, mdlY, family = "binomial", nfold = 10, type.measure = "deviance", paralle = TRUE, alpha = i)
  data.frame(cvm = cv$cvm[cv$lambda == cv$lambda.1se], lambda.1se = cv$lambda.1se, alpha = i)
}
cv3 <- search[search$cvm == min(search$cvm), ]
md3 <- glmnet(mdlX, mdlY, family = "binomial", lambda = cv3$lambda.1se, alpha = cv3$alpha)
coef(md3)
#(Intercept) -1.434700e+00
#AGE         -8.426525e-04
#ACADMOS      .           
#ADEPCNT      .           
#MAJORDRG     6.276924e-02
#MINORDRG     .           
#OWNRENT     -2.780958e-02
#INCOME      -1.305118e-04
#SELFEMPL     .           
#INCPER      -2.085349e-06
#EXP_INC      .           
#SPENDING     .           
#LOGSPEND    -9.992808e-02
roc(newY, as.numeric(predict(md3, newX, type = "response")))
#Area under the curve: 0.6449