In the post https://statcompute.wordpress.com/2019/10/13/assess-variable-importance-in-grnn, it was shown how to assess the variable importance of a GRNN by the decrease in GoF statistics, e.g. AUC, after averaging or dropping the variable of interest. The permutation feature importance evaluates the variable importance in a similar manner by permuting values of the variable, which attempts to break the relationship between the predictor and the response.
Today, I added two functions to calculate PFI in the YAGeR project, e.g. the grnn.x_pfi() function (https://github.com/statcompute/yager/blob/master/code/grnn.x_pfi.R) calculating PFI of an individual variable and the grnn.pfi() function (https://github.com/statcompute/yager/blob/master/code/grnn.pfi.R) calculating PFI for all variables in the GRNN.
Below is an example showing how to use PFI to evaluate the variable importance. It turns out that the outcome looks very similar to the one created by the grnn.imp() function previously discussed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### INITIATE A GRNN | |
net1 <- grnn.fit(x = X1, y = Y1) | |
### FIND THE OPTIMIZED PARAMETER | |
best <- grnn.optmiz_auc(net1, lower = 1, upper = 3) | |
### FIT A GRNN WITH THE OPTIMIZED PARAMETER | |
net2 <- grnn.fit(x = X1, y = Y1, sigma = best$sigma) | |
### CALCULATE PFI BY TRYING 1000 RANDOM PERMUTATIONS | |
pfi_rank <- grnn.pfi(net2, ntry = 1000) | |
# idx var pfi | |
# 9 woe.bureau_score 0.06821683 | |
# 8 woe.rev_util 0.03277195 | |
# 1 woe.tot_derog 0.02845173 | |
# 7 woe.tot_rev_line 0.01680968 | |
# 10 woe.ltv 0.01416647 | |
# 2 woe.tot_tr 0.00610415 | |
# 11 woe.tot_income 0.00595962 | |
# 4 woe.tot_open_tr 0.00561115 | |
# 3 woe.age_oldest_tr 0.00508052 | |
# 5 woe.tot_rev_tr 0.00000000 | |
# 6 woe.tot_rev_debt 0.00000000 | |
### PLOT PFI | |
barplot(pfi_rank$pfi, beside = TRUE, col = heat.colors(nrow(pfi_rank)), border = NA, yaxt = "n", | |
names.arg = substring(pfi_rank$var, 5), main = "Permutation Feature Importance") | |
### EXTRACT VARIABLES WITH 0 PFI | |
excol <- pfi_rank[pfi_rank$pfi == 0, ]$idx | |
# 5 6 | |
### AUC FOR HOLD-OUT SAMPLE WITH ALL VARIABLES | |
MLmetrics::AUC(y_pred = grnn.parpred(grnn.fit(x = X1, y = Y1, sigma = best$sigma), X2), y_true = Y2) | |
# 0.7584476 | |
### AUC FOR HOLD-OUT SAMPLE WITH PFI > 0 VARIABLES | |
MLmetrics::AUC(y_pred = grnn.parpred(grnn.fit(x = X1[, -excol], y = Y1, sigma = best$sigma), X2[, -excol]), y_true = Y2) | |
# 0.7622679 |
You must be logged in to post a comment.