## Posts Tagged ‘**Keras**’

## An Example of Merge Layer in Keras

The power of a DNN does not only come from its depth but also come from its flexibility of accommodating complex network structures. For instance, the DNN shown below consists of two branches, the left with 4 inputs and the right with 6 inputs. In addition, the right branch shows a more complicated structure than the left.

InputLayer (None, 6) Dense (None, 6) BatchNormalization (None, 6) Dense (None, 6) InputLayer (None, 4) BatchNormalization (None, 6) Dense (None, 4) Dense (None, 6) BatchNormalization (None, 4) BatchNormalization (None, 6) \____________________________________/ | Merge (None, 10) Dense (None, 1)

To create a DNN as the above, both left and right branches are defined separately with corresponding inputs and layers. In the line 29, both branches would be combined with a MERGE layer. There are multiple benefits of such merged DNNs. For instance, the DNN has the flexibility to handle various inputs differently. In addition, new features can be added conveniently without messing around with the existing network structure.

from pandas import read_csv, DataFrame from numpy.random import seed from sklearn.preprocessing import scale from keras.models import Sequential from keras.constraints import maxnorm from keras.optimizers import SGD from keras.layers import Dense, Merge from keras.layers.normalization import BatchNormalization from keras_diagram import ascii df = read_csv("credit_count.txt") Y = df[df.CARDHLDR == 1].DEFAULTS X1 = scale(df[df.CARDHLDR == 1][["MAJORDRG", "MINORDRG", "OWNRENT", "SELFEMPL"]]) X2 = scale(df[df.CARDHLDR == 1][["AGE", "ACADMOS", "ADEPCNT", "INCPER", "EXP_INC", "INCOME"]]) branch1 = Sequential() branch1.add(Dense(X1.shape[1], input_shape = (X1.shape[1],), init = 'normal', activation = 'relu')) branch1.add(BatchNormalization()) branch2 = Sequential() branch2.add(Dense(X2.shape[1], input_shape = (X2.shape[1],), init = 'normal', activation = 'relu')) branch2.add(BatchNormalization()) branch2.add(Dense(X2.shape[1], init = 'normal', activation = 'relu', W_constraint = maxnorm(5))) branch2.add(BatchNormalization()) branch2.add(Dense(X2.shape[1], init = 'normal', activation = 'relu', W_constraint = maxnorm(5))) branch2.add(BatchNormalization()) model = Sequential() model.add(Merge([branch1, branch2], mode = 'concat')) model.add(Dense(1, init = 'normal', activation = 'sigmoid')) sgd = SGD(lr = 0.1, momentum = 0.9, decay = 0, nesterov = False) model.compile(loss = 'binary_crossentropy', optimizer = sgd, metrics = ['accuracy']) seed(2017) model.fit([X1, X2], Y.values, batch_size = 2000, nb_epoch = 100, verbose = 1)

## Dropout Regularization in Deep Neural Networks

The deep neural network (DNN) is a very powerful neural work with multiple hidden layers and is able to capture the highly complex relationship between the response and predictors. However, it is prone to the over-fitting due to a large number of parameters that makes the regularization crucial for DNNs. In the paper (https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf), an interesting regularization approach, e.g. dropout, was proposed with a simple and elegant idea. Basically, it suppresses the complexity of DNNs by randomly dropping units in both input and hidden layers.

Below is an example showing how to tune the hyper-parameter of dropout rates with Keras library in Python. Because of the long computing time required by the dropout, the parallelism is used to speed up the process.

from pandas import read_csv, DataFrame from numpy.random import seed from sklearn.preprocessing import scale from sklearn.model_selection import train_test_split from sklearn.metrics import roc_auc_score from keras.models import Sequential from keras.constraints import maxnorm from keras.optimizers import SGD from keras.layers import Dense, Dropout from multiprocessing import Pool, cpu_count from itertools import product from parmap import starmap df = read_csv("credit_count.txt") Y = df[df.CARDHLDR == 1].DEFAULT X = df[df.CARDHLDR == 1][['AGE', 'ADEPCNT', 'MAJORDRG', 'MINORDRG', 'INCOME', 'OWNRENT', 'SELFEMPL']] sX = scale(X) ncol = sX.shape[1] x_train, x_test, y_train, y_test = train_test_split(sX, Y, train_size = 0.5, random_state = seed(2017)) def tune_dropout(rate1, rate2): net = Sequential() ## DROPOUT AT THE INPUT LAYER net.add(Dropout(rate1, input_shape = (ncol,))) ## DROPOUT AT THE 1ST HIDDEN LAYER net.add(Dense(ncol, init = 'normal', activation = 'relu', W_constraint = maxnorm(4))) net.add(Dropout(rate2)) ## DROPOUT AT THE 2ND HIDDER LAYER net.add(Dense(ncol, init = 'normal', activation = 'relu', W_constraint = maxnorm(4))) net.add(Dropout(rate2)) net.add(Dense(1, init = 'normal', activation = 'sigmoid')) sgd = SGD(lr = 0.1, momentum = 0.9, decay = 0, nesterov = False) net.compile(loss='binary_crossentropy', optimizer = sgd, metrics = ['accuracy']) net.fit(x_train, y_train, batch_size = 200, nb_epoch = 50, verbose = 0) print rate1, rate2, "{:6.4f}".format(roc_auc_score(y_test, net.predict(x_test))) input_dp = [0.1, 0.2, 0.3] hidden_dp = [0.2, 0.3, 0.4, 0.5] parms = [i for i in product(input_dp, hidden_dp)] seed(2017) starmap(tune_dropout, parms, pool = Pool(processes = cpu_count()))

As shown in the output below, the optimal dropout rate appears to be 0.2 incidentally for both input and hidden layers.

0.1 0.2 0.6354 0.1 0.4 0.6336 0.1 0.3 0.6389 0.1 0.5 0.6378 0.2 0.2 0.6419 0.2 0.4 0.6385 0.2 0.3 0.6366 0.2 0.5 0.6359 0.3 0.4 0.6313 0.3 0.2 0.6350 0.3 0.3 0.6346 0.3 0.5 0.6343