## Archive for the ‘**Parallelism**’ Category

## Random Search for Optimal Parameters

Practices of manual search, grid search, or the combination of both have been successfully employed in the machine learning to optimize hyper-parameters. However, in the arena of deep learning, both approaches might become impractical. For instance, the computing cost of grid search for hyper-parameters in a multi-layer deep neural network (DNN) could be prohibitively high.

In light of aforementioned hurdles, Bergstra and Bengio proposed a novel idea of random search in the paper http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf. In their study, it was found that random search is more efficient than grid search for the hyper-parameter optimization in terms of computing costs.

In the example below, it is shown that both grid search and random search have reached similar results in the SVM parameter optimization based on cross-validations.

import pandas as pd import numpy as np from sklearn import preprocessing from sklearn.model_selection import GridSearchCV, RandomizedSearchCV from sklearn.svm import SVC as svc from sklearn.metrics import make_scorer, roc_auc_score from scipy import stats # DATA PREPARATION df = pd.read_csv("credit_count.txt") y = df[df.CARDHLDR == 1].DEFAULT.values x = preprocessing.scale(df[df.CARDHLDR == 1].ix[:, 2:12], axis = 0) # DEFINE MODEL AND PERFORMANCE MEASURE mdl = svc(probability = True, random_state = 1) auc = make_scorer(roc_auc_score) # GRID SEARCH FOR 20 COMBINATIONS OF PARAMETERS grid_list = {"C": np.arange(2, 10, 2), "gamma": np.arange(0.1, 1, 0.2)} grid_search = GridSearchCV(mdl, param_grid = grid_list, n_jobs = 4, cv = 3, scoring = auc) grid_search.fit(x, y) grid_search.cv_results_ # RANDOM SEARCH FOR 20 COMBINATIONS OF PARAMETERS rand_list = {"C": stats.uniform(2, 10), "gamma": stats.uniform(0.1, 1)} rand_search = RandomizedSearchCV(mdl, param_distributions = rand_list, n_iter = 20, n_jobs = 4, cv = 3, random_state = 2017, scoring = auc) rand_search.fit(x, y) rand_search.cv_results_

## Dropout Regularization in Deep Neural Networks

The deep neural network (DNN) is a very powerful neural work with multiple hidden layers and is able to capture the highly complex relationship between the response and predictors. However, it is prone to the over-fitting due to a large number of parameters that makes the regularization crucial for DNNs. In the paper (https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf), an interesting regularization approach, e.g. dropout, was proposed with a simple and elegant idea. Basically, it suppresses the complexity of DNNs by randomly dropping units in both input and hidden layers.

Below is an example showing how to tune the hyper-parameter of dropout rates with Keras library in Python. Because of the long computing time required by the dropout, the parallelism is used to speed up the process.

from pandas import read_csv, DataFrame from numpy.random import seed from sklearn.preprocessing import scale from sklearn.model_selection import train_test_split from sklearn.metrics import roc_auc_score from keras.models import Sequential from keras.constraints import maxnorm from keras.optimizers import SGD from keras.layers import Dense, Dropout from multiprocessing import Pool, cpu_count from itertools import product from parmap import starmap df = read_csv("credit_count.txt") Y = df[df.CARDHLDR == 1].DEFAULT X = df[df.CARDHLDR == 1][['AGE', 'ADEPCNT', 'MAJORDRG', 'MINORDRG', 'INCOME', 'OWNRENT', 'SELFEMPL']] sX = scale(X) ncol = sX.shape[1] x_train, x_test, y_train, y_test = train_test_split(sX, Y, train_size = 0.5, random_state = seed(2017)) def tune_dropout(rate1, rate2): net = Sequential() ## DROPOUT AT THE INPUT LAYER net.add(Dropout(rate1, input_shape = (ncol,))) ## DROPOUT AT THE 1ST HIDDEN LAYER net.add(Dense(ncol, init = 'normal', activation = 'relu', W_constraint = maxnorm(4))) net.add(Dropout(rate2)) ## DROPOUT AT THE 2ND HIDDER LAYER net.add(Dense(ncol, init = 'normal', activation = 'relu', W_constraint = maxnorm(4))) net.add(Dropout(rate2)) net.add(Dense(1, init = 'normal', activation = 'sigmoid')) sgd = SGD(lr = 0.1, momentum = 0.9, decay = 0, nesterov = False) net.compile(loss='binary_crossentropy', optimizer = sgd, metrics = ['accuracy']) net.fit(x_train, y_train, batch_size = 200, nb_epoch = 50, verbose = 0) print rate1, rate2, "{:6.4f}".format(roc_auc_score(y_test, net.predict(x_test))) input_dp = [0.1, 0.2, 0.3] hidden_dp = [0.2, 0.3, 0.4, 0.5] parms = [i for i in product(input_dp, hidden_dp)] seed(2017) starmap(tune_dropout, parms, pool = Pool(processes = cpu_count()))

As shown in the output below, the optimal dropout rate appears to be 0.2 incidentally for both input and hidden layers.

0.1 0.2 0.6354 0.1 0.4 0.6336 0.1 0.3 0.6389 0.1 0.5 0.6378 0.2 0.2 0.6419 0.2 0.4 0.6385 0.2 0.3 0.6366 0.2 0.5 0.6359 0.3 0.4 0.6313 0.3 0.2 0.6350 0.3 0.3 0.6346 0.3 0.5 0.6343

## Improve SVM Tuning through Parallelism

As pointed out in the chapter 10 of “The Elements of Statistical Learning”, ANN and SVM (support vector machines) share similar pros and cons, e.g. lack of interpretability and good predictive power. However, in contrast to ANN usually suffering from local minima solutions, SVM is always able to converge globally. In addition, SVM is less prone to over-fitting given a good choice of free parameters, which usually can be identified through cross-validations.

In the R package “e1071”, tune() function can be used to search for SVM parameters but is extremely inefficient due to the sequential instead of parallel executions. In the code snippet below, a parallelism-based algorithm performs the grid search for SVM parameters through the K-fold cross validation.

pkgs <- c('foreach', 'doParallel') lapply(pkgs, require, character.only = T) registerDoParallel(cores = 4) ### PREPARE FOR THE DATA ### df1 <- read.csv("credit_count.txt") df2 <- df1[df1$CARDHLDR == 1, ] x <- paste("AGE + ACADMOS + ADEPCNT + MAJORDRG + MINORDRG + OWNRENT + INCOME + SELFEMPL + INCPER + EXP_INC") fml <- as.formula(paste("as.factor(DEFAULT) ~ ", x)) ### SPLIT DATA INTO K FOLDS ### set.seed(2016) df2$fold <- caret::createFolds(1:nrow(df2), k = 4, list = FALSE) ### PARAMETER LIST ### cost <- c(10, 100) gamma <- c(1, 2) parms <- expand.grid(cost = cost, gamma = gamma) ### LOOP THROUGH PARAMETER VALUES ### result <- foreach(i = 1:nrow(parms), .combine = rbind) %do% { c <- parms[i, ]$cost g <- parms[i, ]$gamma ### K-FOLD VALIDATION ### out <- foreach(j = 1:max(df2$fold), .combine = rbind, .inorder = FALSE) %dopar% { deve <- df2[df2$fold != j, ] test <- df2[df2$fold == j, ] mdl <- e1071::svm(fml, data = deve, type = "C-classification", kernel = "radial", cost = c, gamma = g, probability = TRUE) pred <- predict(mdl, test, decision.values = TRUE, probability = TRUE) data.frame(y = test$DEFAULT, prob = attributes(pred)$probabilities[, 2]) } ### CALCULATE SVM PERFORMANCE ### roc <- pROC::roc(as.factor(out$y), out$prob) data.frame(parms[i, ], roc = roc$auc[1]) }

## Parallelize Map()

Map() is a convenient routine in Python to apply a function to all items from one or more lists, as shown below. This specific nature also makes map() a perfect candidate for the parallelism.

In [1]: a = (1, 2, 3) In [2]: b = (10, 20, 30) In [3]: def func(a, b): ...: print "a -->", a, "b -->", b ...: In [4]: ### SERIAL CALL ### In [5]: map(func, a, b) a --> 1 b --> 10 a --> 2 b --> 20 a --> 3 b --> 30

Pool.map() function in Multiprocessing Package is the parallel implementation of map(). However, a drawback is that Pool.map() doesn’t support more than one arguments in the function call. Therefore, in case of a functional call with multiple arguments, a wrapper function is necessary to make it working, which however should be defined before importing Multiprocessing package.

In [6]: ### PARALLEL CALL ### In [7]: ### SINCE POOL.MAP() DOESN'T TAKE MULTIPLE ARGUMENTS, A WRAPPER IS NEEDED In [8]: def f2(ab): ...: a, b = ab ...: return func(a, b) ...: In [9]: from multiprocessing import Pool, cpu_count In [10]: pool = Pool(processes = cpu_count()) In [11]: ### PARALLEL MAP() ON ALL CPUS In [12]: pool.map(f2, zip(a, b)) a --> 1 b --> 10 a --> 2 b --> 20 a --> 3 b --> 30

In addition, Pool.apply() function, with some tweaks, can also be employed to mimic the parallel version of map(). The advantage of this approach is that, different from Pool.map(), Pool.apply() is able to handle multiple arguments by using the list comprehension.

In [13]: ### POOL.APPLY() CAN ALSO MIMIC MAP() In [14]: [pool.apply(func, args = (i, j)) for i, j in zip(a, b)] a --> 1 b --> 10 a --> 2 b --> 20 a --> 3 b --> 30

Alternatively, starmap() function in the parmap package (https://github.com/zeehio/parmap), which is specifically designed to overcome limitations in Pool.map(), provides a more friendly and elegant interface to implement the parallelized map() with multiple arguments at the cost of a slight computing overhead.

In [15]: ### ALTERNATIVELY, PARMAP PACKAGE IS USED In [16]: from parmap import starmap In [17]: starmap(func, zip(a, b), pool = Pool(processes = cpu_count())) a --> 1 b --> 10 a --> 2 b --> 20 a --> 3 b --> 30