In the previous post (https://statcompute.wordpress.com/2019/02/03/sobol-sequence-vs-uniform-random-in-hyper-parameter-optimization), it is shown how to identify the optimal hyper-parameter in a General Regression Neural Network by using the Sobol sequence and the uniform random generator respectively through the N-fold cross validation. While the Sobol sequence yields a slightly better performance, outcomes from both approaches are very similar, as shown below based upon five trials with 20 samples in each. Both approaches can be generalized from one-dimensional to multi-dimensional domains, e.g. boosting or deep learning.
net <- grnn.fit(scale(Boston[, -14]), Boston[, 14], sigma = 1) sb_out <- Reduce(rbind, Map(function(x) grnn.cv(net, gen_sobol(0.1, 1.0, 20, x), 4, 2019), seq(1, 5))) uf_out <- Reduce(rbind, Map(function(x) grnn.cv(net, gen_unifm(0.1, 1.0, 20, x), 4, 2019), seq(1, 5))) Map(function(x) x[x$R2 == max(x$R2), ], list(sobol = sb_out, uniform = uf_out)) # $sobol # sigma R2 # 0.5568 0.8019342 # $uniform # sigma R2 # 0.5608 0.8019327
Other than the random search, another way to locate the optimal hyper-parameter is applying general optimization routines, As shown in the demonstration below, we first need to define an objective function, e.g. grnn.optim(), to maximize the Cross-Validation R^2. In addition, depending on the optimization algorithm, upper and lower bounds of the parameter to be optimized should also be provided. Three optimization algorithms are employed in the example, including unconstrained non-linear optimization, particle swarm optimization, and Nelder–Mead simplex optimization, with all showing comparable outcomes to ones achieved by the random search.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
net <- grnn.fit(scale(Boston[, -14]), Boston[, 14], sigma = 1) | |
grnn.optim <- function(sigma, nn, nfolds, seed) { | |
dt <- nn$set | |
set.seed(seed) | |
folds <- caret::createFolds(1:nrow(dt), k = nfolds, list = FALSE) | |
r <- do.call(rbind, | |
lapply(1:nfolds, | |
function(i) data.frame(Ya = nn$Ya[folds == i], | |
Yp = grnn.predict(grnn.fit(nn$Xa[folds != i, ], nn$Ya[folds != i], sigma), | |
data.frame(nn$Xa[folds == i,]))))) | |
return(r2(r$Ya, r$Yp)) | |
} | |
### General-Purpose Unconstrained Non-Linear Optimization ### | |
op_out <- ucminf::ucminf(par = 0.5, fn = function(x) -grnn.optim(x, net, 4, 2019)) | |
# $par | |
# [1] 0.5611872 | |
# $value | |
# [1] -0.8019319 | |
### Particle Swarm Optimization ### | |
set.seed(1) | |
ps_out <- pso::psoptim(par = 0.5, upper = 1.0, lower = 0.1, | |
fn = function(x) -grnn.optim(x, net, 4, 2019), | |
control = list(maxit = 20)) | |
# $par | |
# [1] 0.5583358 | |
# $value | |
# [1] -0.8019351 | |
### Nelder–Mead Optimization ### | |
nm_out <- optim(par = 0.5, fn = function(x) -grnn.optim(x, net, 4, 2019), | |
method = "Nelder-Mead", control = list(warn.1d.NelderMead = FALSE)) | |
# $par | |
# [1] 0.5582031 | |
# $value | |
# [1] -0.8019351 |