Direct Optimization of Hyper-Parameter

In the previous post (https://statcompute.wordpress.com/2019/02/03/sobol-sequence-vs-uniform-random-in-hyper-parameter-optimization), it is shown how to identify the optimal hyper-parameter in a General Regression Neural Network by using the Sobol sequence and the uniform random generator respectively through the N-fold cross validation. While the Sobol sequence yields a slightly better performance, outcomes from both approaches are very similar, as shown below based upon five trials with 20 samples in each. Both approaches can be generalized from one-dimensional to multi-dimensional domains, e.g. boosting or deep learning.

net <- grnn.fit(scale(Boston[, -14]), Boston[, 14], sigma = 1)
                        
sb_out <- Reduce(rbind, Map(function(x) grnn.cv(net, gen_sobol(0.1, 1.0, 20, x), 4, 2019), seq(1, 5)))

uf_out <- Reduce(rbind, Map(function(x) grnn.cv(net, gen_unifm(0.1, 1.0, 20, x), 4, 2019), seq(1, 5)))

Map(function(x) x[x$R2 == max(x$R2), ], list(sobol = sb_out, uniform = uf_out))
# $sobol
#  sigma        R2
# 0.5568 0.8019342
# $uniform
#  sigma        R2
# 0.5608 0.8019327

Other than the random search, another way to locate the optimal hyper-parameter is applying general optimization routines, As shown in the demonstration below, we first need to define an objective function, e.g. grnn.optim(), to maximize the Cross-Validation R^2. In addition, depending on the optimization algorithm, upper and lower bounds of the parameter to be optimized should also be provided. Three optimization algorithms are employed in the example, including unconstrained non-linear optimization, particle swarm optimization, and Nelder–Mead simplex optimization, with all showing comparable outcomes to ones achieved by the random search.


net <- grnn.fit(scale(Boston[, -14]), Boston[, 14], sigma = 1)
grnn.optim <- function(sigma, nn, nfolds, seed) {
dt <- nn$set
set.seed(seed)
folds <- caret::createFolds(1:nrow(dt), k = nfolds, list = FALSE)
r <- do.call(rbind,
lapply(1:nfolds,
function(i) data.frame(Ya = nn$Ya[folds == i],
Yp = grnn.predict(grnn.fit(nn$Xa[folds != i, ], nn$Ya[folds != i], sigma),
data.frame(nn$Xa[folds == i,])))))
return(r2(r$Ya, r$Yp))
}
### General-Purpose Unconstrained Non-Linear Optimization ###
op_out <- ucminf::ucminf(par = 0.5, fn = function(x) -grnn.optim(x, net, 4, 2019))
# $par
# [1] 0.5611872
# $value
# [1] -0.8019319
### Particle Swarm Optimization ###
set.seed(1)
ps_out <- pso::psoptim(par = 0.5, upper = 1.0, lower = 0.1,
fn = function(x) -grnn.optim(x, net, 4, 2019),
control = list(maxit = 20))
# $par
# [1] 0.5583358
# $value
# [1] -0.8019351
### Nelder–Mead Optimization ###
nm_out <- optim(par = 0.5, fn = function(x) -grnn.optim(x, net, 4, 2019),
method = "Nelder-Mead", control = list(warn.1d.NelderMead = FALSE))
# $par
# [1] 0.5582031
# $value
# [1] -0.8019351

view raw

grnn_optim.R

hosted with ❤ by GitHub