I can calculate the motion of heavenly bodies but not the madness of people. -Isaac Newton

## Dropout Regularization in Deep Neural Networks

The deep neural network (DNN) is a very powerful neural work with multiple hidden layers and is able to capture the highly complex relationship between the response and predictors. However, it is prone to the over-fitting due to a large number of parameters that makes the regularization crucial for DNNs. In the paper (https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf), an interesting regularization approach, e.g. dropout, was proposed with a simple and elegant idea. Basically, it suppresses the complexity of DNNs by randomly dropping units in both input and hidden layers.

Below is an example showing how to tune the hyper-parameter of dropout rates with Keras library in Python. Because of the long computing time required by the dropout, the parallelism is used to speed up the process.

```from pandas import read_csv, DataFrame
from numpy.random import seed
from sklearn.preprocessing import scale
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from keras.models import Sequential
from keras.constraints import maxnorm
from keras.optimizers import SGD
from keras.layers import Dense, Dropout
from multiprocessing import Pool, cpu_count
from itertools import product
from parmap import starmap

Y = df[df.CARDHLDR == 1].DEFAULT
X = df[df.CARDHLDR == 1][['AGE', 'ADEPCNT', 'MAJORDRG', 'MINORDRG', 'INCOME', 'OWNRENT', 'SELFEMPL']]
sX = scale(X)
ncol = sX.shape[1]
x_train, x_test, y_train, y_test = train_test_split(sX, Y, train_size = 0.5, random_state = seed(2017))

def tune_dropout(rate1, rate2):
net = Sequential()
## DROPOUT AT THE INPUT LAYER
## DROPOUT AT THE 1ST HIDDEN LAYER
net.add(Dense(ncol, init = 'normal', activation = 'relu', W_constraint = maxnorm(4)))
## DROPOUT AT THE 2ND HIDDER LAYER
net.add(Dense(ncol, init = 'normal', activation = 'relu', W_constraint = maxnorm(4)))
net.add(Dense(1, init = 'normal', activation = 'sigmoid'))
sgd = SGD(lr = 0.1, momentum = 0.9, decay = 0, nesterov = False)
net.compile(loss='binary_crossentropy', optimizer = sgd, metrics = ['accuracy'])
net.fit(x_train, y_train, batch_size = 200, nb_epoch = 50, verbose = 0)
print rate1, rate2, "{:6.4f}".format(roc_auc_score(y_test, net.predict(x_test)))

input_dp = [0.1, 0.2, 0.3]
hidden_dp = [0.2, 0.3, 0.4, 0.5]
parms = [i for i in product(input_dp, hidden_dp)]

seed(2017)
starmap(tune_dropout, parms, pool = Pool(processes = cpu_count()))

```

As shown in the output below, the optimal dropout rate appears to be 0.2 incidentally for both input and hidden layers.

```0.1 0.2 0.6354
0.1 0.4 0.6336
0.1 0.3 0.6389
0.1 0.5 0.6378
0.2 0.2 0.6419
0.2 0.4 0.6385
0.2 0.3 0.6366
0.2 0.5 0.6359
0.3 0.4 0.6313
0.3 0.2 0.6350
0.3 0.3 0.6346
0.3 0.5 0.6343
```