Title: | Explanation Groves |
---|---|
Description: | Compute surrogate explanation groves for predictive machine learning models and analyze complexity vs. explanatory power of an explanation according to Szepannek, G. and von Holt, B. (2023) <doi:10.1007/s41237-023-00205-2>. |
Authors: | Gero Szepannek [aut, cre] |
Maintainer: | Gero Szepannek <[email protected]> |
License: | GPL (>=2) |
Version: | 0.1-13 |
Built: | 2024-11-14 05:14:30 UTC |
Source: | https://github.com/g-rho/xgrove |
Plot statistics of surrogate trees to analyze complexity vs. explanatory power.
## S3 method for class 'sgtree' plot(x, abs = "rules", ord = "upsilon", ...)
## S3 method for class 'sgtree' plot(x, abs = "rules", ord = "upsilon", ...)
x |
An object of class |
abs |
Name of the measure to be plotted on the x-axis, either |
ord |
Name of the measure to be plotted on the y-axis, either |
... |
Further arguments passed to |
No return value.
library(randomForest) library(pdp) data(boston) set.seed(42) rf <- randomForest(cmedv ~ ., data = boston) data <- boston[,-3] # remove target variable ntrees <- c(4,8,16,32,64,128) xg <- xgrove(rf, data, ntrees) xg plot(xg)
library(randomForest) library(pdp) data(boston) set.seed(42) rf <- randomForest(cmedv ~ ., data = boston) data <- boston[,-3] # remove target variable ntrees <- c(4,8,16,32,64,128) xg <- xgrove(rf, data, ntrees) xg plot(xg)
Plot statistics of surrogate groves to analyze complexity vs. explanatory power.
## S3 method for class 'xgrove' plot(x, abs = "rules", ord = "upsilon", ...)
## S3 method for class 'xgrove' plot(x, abs = "rules", ord = "upsilon", ...)
x |
An object of class |
abs |
Name of the measure to be plotted on the x-axis, either |
ord |
Name of the measure to be plotted on the y-axis, either |
... |
Further arguments passed to |
No return value.
library(randomForest) library(pdp) data(boston) set.seed(42) rf <- randomForest(cmedv ~ ., data = boston) data <- boston[,-3] # remove target variable ntrees <- c(4,8,16,32,64,128) xg <- xgrove(rf, data, ntrees) xg plot(xg)
library(randomForest) library(pdp) data(boston) set.seed(42) rf <- randomForest(cmedv ~ ., data = boston) data <- boston[,-3] # remove target variable ntrees <- c(4,8,16,32,64,128) xg <- xgrove(rf, data, ntrees) xg plot(xg)
Compute surrogate trees of different depth to explain predictive machine learning model and analyze complexity vs. explanatory power.
sgtree(model, data, maxdeps = 1:8, cparam = 0, pfun = NULL, ...)
sgtree(model, data, maxdeps = 1:8, cparam = 0, pfun = NULL, ...)
model |
A model with corresponding predict function that returns numeric values. |
data |
Data that must not (!) contain the target variable. |
maxdeps |
Sequence of integers: Maximum depth of the trees. |
cparam |
Complexity parameter for growing the trees. |
pfun |
Optional predict function |
... |
Further arguments to be passed to |
A surrogate grove is trained via gradient boosting using rpart
on data
with the predictions of using of the model
as target variable.
Note that data
must not contain the original target variable!
List of the results:
explanation |
Matrix containing tree sizes, rules, explainability |
rules |
List of rules for each tree. |
model |
List of the |
Szepannek, G. and Laabs, B.H. (2023): Can’t see the forest for the trees – analyzing groves to explain random forests, Behaviormetrika, submitted.
Szepannek, G. and Luebke, K.(2023): How much do we see? On the explainability of partial dependence plots for credit risk scoring, Argumenta Oeconomica 50, DOI: 10.15611/aoe.2023.1.07.
library(randomForest) library(pdp) data(boston) set.seed(42) rf <- randomForest(cmedv ~ ., data = boston) data <- boston[,-3] # remove target variable maxds <- 1:7 st <- sgtree(rf, data, maxds) st # rules for tree of depth 3 st$rules[["3"]] # plot tree of depth 3 rpart.plot::rpart.plot(st$model[["3"]])
library(randomForest) library(pdp) data(boston) set.seed(42) rf <- randomForest(cmedv ~ ., data = boston) data <- boston[,-3] # remove target variable maxds <- 1:7 st <- sgtree(rf, data, maxds) st # rules for tree of depth 3 st$rules[["3"]] # plot tree of depth 3 rpart.plot::rpart.plot(st$model[["3"]])
Compute explainability given predicted data of the model and an explainer.
upsilon(porig, pexp)
upsilon(porig, pexp)
porig |
An object of class |
pexp |
Name of the measure to be plotted on the x-axis, either |
Numeric explainability upsilon.
Szepannek, G. and Luebke, K.(2023): How much do we see? On the explainability of partial dependence plots for credit risk scoring, Argumenta Oeconomica 50, DOI: 10.15611/aoe.2023.1.07.
library(randomForest) library(pdp) data(boston) set.seed(42) # Compute original model rf <- randomForest(cmedv ~ ., data = boston) data <- boston[,-3] # remove target variable # Compute predictions porig <- predict(rf, data) # Compute surrogate grove xg <- xgrove(rf, data) pexp <- predict(xg$model, data, n.trees = 16) upsilon(porig, pexp)
library(randomForest) library(pdp) data(boston) set.seed(42) # Compute original model rf <- randomForest(cmedv ~ ., data = boston) data <- boston[,-3] # remove target variable # Compute predictions porig <- predict(rf, data) # Compute surrogate grove xg <- xgrove(rf, data) pexp <- predict(xg$model, data, n.trees = 16) upsilon(porig, pexp)
Compute surrogate groves to explain predictive machine learning model and analyze complexity vs. explanatory power.
xgrove( model, data, ntrees = c(4, 8, 16, 32, 64, 128), pfun = NULL, remove.target = T, shrink = 1, b.frac = 1, seed = 42, ... )
xgrove( model, data, ntrees = c(4, 8, 16, 32, 64, 128), pfun = NULL, remove.target = T, shrink = 1, b.frac = 1, seed = 42, ... )
model |
A model with corresponding predict function that returns numeric values. |
data |
Training data. |
ntrees |
Sequence of integers: number of boosting trees for rule extraction. |
pfun |
Optional predict function |
remove.target |
Logical. If |
shrink |
Sets the |
b.frac |
Sets the |
seed |
Seed for the random number generator to ensure reproducible results (e.g. for the default |
... |
Further arguments to be passed to |
A surrogate grove is trained via gradient boosting using gbm
on data
with the predictions of using of the model
as target variable.
Note that data
must not contain the original target variable! The boosting model is trained using stumps of depth 1.
The resulting interpretation is extracted from pretty.gbm.tree
.
The column upper_bound_left
of the rules
and the groves
value of the output object contains
the split point for numeric variables denoting the uppoer bound of the left branch. Correspondingly, the
levels_left
column contains the levels of factor variables assigned to the left branch.
The rule weights of the branches are given in the rightmost columns. The prediction of the grove is
obtained as the sum of the assigned weights over all rows.
Note that the training data must not contain the target variable. It can be either removed manually or will be removed automatically from data
if the argument remove.target == TRUE
.
List of the results:
explanation |
Matrix containing tree sizes, rules, explainability |
rules |
Summary of the explanation grove: Rules with identical splits are aggegated. For numeric variables any splits are merged if they lead to identical parititions of the training data. |
groves |
Rules of the explanation grove. |
model |
|
Szepannek, G. and von Holt, B.H. (2023): Can’t see the forest for the trees – analyzing groves to explain random forests, Behaviormetrika, DOI: 10.1007/s41237-023-00205-2.
Szepannek, G. and Luebke, K.(2023): How much do we see? On the explainability of partial dependence plots for credit risk scoring, Argumenta Oeconomica 50, DOI: 10.15611/aoe.2023.1.07.
library(randomForest) library(pdp) data(boston) set.seed(42) rf <- randomForest(cmedv ~ ., data = boston) data <- boston[,-3] # remove target variable ntrees <- c(4,8,16,32,64,128) xg <- xgrove(rf, data, ntrees) xg plot(xg) # Example of a classification problem using the iris data. # A predict function has to be defined, here for the posterior probabilities of the class Virginica. data(iris) set.seed(42) rf <- randomForest(Species ~ ., data = iris) data <- iris[,-5] # remove target variable pf <- function(model, data){ predict(model, data, type = "prob")[,3] } xgrove(rf, data, pfun = pf)
library(randomForest) library(pdp) data(boston) set.seed(42) rf <- randomForest(cmedv ~ ., data = boston) data <- boston[,-3] # remove target variable ntrees <- c(4,8,16,32,64,128) xg <- xgrove(rf, data, ntrees) xg plot(xg) # Example of a classification problem using the iris data. # A predict function has to be defined, here for the posterior probabilities of the class Virginica. data(iris) set.seed(42) rf <- randomForest(Species ~ ., data = iris) data <- iris[,-5] # remove target variable pf <- function(model, data){ predict(model, data, type = "prob")[,3] } xgrove(rf, data, pfun = pf)