Permutation Feature Importance (PFI) of GRNN

In the post https://statcompute.wordpress.com/2019/10/13/assess-variable-importance-in-grnn, it was shown how to assess the variable importance of a GRNN by the decrease in GoF statistics, e.g. AUC, after averaging or dropping the variable of interest. The permutation feature importance evaluates the variable importance in a similar manner by permuting values of the variable, which attempts to break the relationship between the predictor and the response.

Today, I added two functions to calculate PFI in the YAGeR project, e.g. the grnn.x_pfi() function (https://github.com/statcompute/yager/blob/master/code/grnn.x_pfi.R) calculating PFI of an individual variable and the grnn.pfi() function (https://github.com/statcompute/yager/blob/master/code/grnn.pfi.R) calculating PFI for all variables in the GRNN.

Below is an example showing how to use PFI to evaluate the variable importance. It turns out that the outcome looks very similar to the one created by the grnn.imp() function previously discussed.


### INITIATE A GRNN
net1 <- grnn.fit(x = X1, y = Y1)
### FIND THE OPTIMIZED PARAMETER
best <- grnn.optmiz_auc(net1, lower = 1, upper = 3)
### FIT A GRNN WITH THE OPTIMIZED PARAMETER
net2 <- grnn.fit(x = X1, y = Y1, sigma = best$sigma)
### CALCULATE PFI BY TRYING 1000 RANDOM PERMUTATIONS
pfi_rank <- grnn.pfi(net2, ntry = 1000)
# idx var pfi
# 9 woe.bureau_score 0.06821683
# 8 woe.rev_util 0.03277195
# 1 woe.tot_derog 0.02845173
# 7 woe.tot_rev_line 0.01680968
# 10 woe.ltv 0.01416647
# 2 woe.tot_tr 0.00610415
# 11 woe.tot_income 0.00595962
# 4 woe.tot_open_tr 0.00561115
# 3 woe.age_oldest_tr 0.00508052
# 5 woe.tot_rev_tr 0.00000000
# 6 woe.tot_rev_debt 0.00000000
### PLOT PFI
barplot(pfi_rank$pfi, beside = TRUE, col = heat.colors(nrow(pfi_rank)), border = NA, yaxt = "n",
names.arg = substring(pfi_rank$var, 5), main = "Permutation Feature Importance")
### EXTRACT VARIABLES WITH 0 PFI
excol <- pfi_rank[pfi_rank$pfi == 0, ]$idx
# 5 6
### AUC FOR HOLD-OUT SAMPLE WITH ALL VARIABLES
MLmetrics::AUC(y_pred = grnn.parpred(grnn.fit(x = X1, y = Y1, sigma = best$sigma), X2), y_true = Y2)
# 0.7584476
### AUC FOR HOLD-OUT SAMPLE WITH PFI > 0 VARIABLES
MLmetrics::AUC(y_pred = grnn.parpred(grnn.fit(x = X1[, -excol], y = Y1, sigma = best$sigma), X2[, -excol]), y_true = Y2)
# 0.7622679

view raw

use_pfi.R

hosted with ❤ by GitHub

pfi