Plot - shapr #418

hanneleer · 2024-11-15T13:04:45Z

Dear all,

I attempted to plot the Shapley values using the shapr package, but I encountered an issue. Here is the plot I generated:

Has anyone else experienced a similar issue? I don’t think the plot is displaying correctly, especially with the strange vertical lines. Any advice would be greatly appreciated!

Thanks!

martinju · 2024-11-15T13:25:08Z

Hi. I agree this does not look good. I thought we had fixed things like this in #406, but maybe this is an edge case. Please provide a complete runnable example, and we will look into it.

hanneleer · 2024-11-15T15:58:57Z

Hi @martinju! Thank you so much for this! I hope this is enough information - please let me now otherwise:

Data_O <- read_csv(synthetic_data)
# Remove rows with missing values
Data_O <- Data_O[complete.cases(Data_O),]
# Handle extremes of target 
Data_O <- Data_O %>% filter(actief_in_inst_2022_SCH > 0.60)
Data_O$actief_in_inst_2022_SCH <- sqrt(Data_O$actief_in_inst_2022_SCH)

# Features 
check <- as.data.frame(model.matrix(~., data = Data_O[, c(3, 32:36, 38, 55:68)]))
check[] <- lapply(check, as.numeric)  
check <- as.matrix(check)  
check <- check[, -1]  

# Outcome variable
y <- as.numeric(Data_O$actief_in_inst_2022_SCH)

# Split dataset into training (70%) and test (30%) sets
samp <- sample(nrow(Data_O), 0.7 * nrow(Data_O))

Train1 <- check[samp, ]
Train1 <- as.data.frame(Train1)

Test1 <- check[-samp, ]
Test1 <- as.data.frame(Test1)

Y_train <- y[samp]
Y_test <- y[-samp]

# Train Random Forest model 
rf.fit <- ranger::ranger(Y_train ~ .,
                         data = Train1,
                         mtry = 14,  
                         max.depth = 3,
                         replace = FALSE, 
                         min.node.size = 40, 
                         sample.fraction = 0.8, 
                         respect.unordered.factors = "order", 
                         importance = "permutation")

# SHAPR 
p <- mean(Y_train)
library(shapr)
explanation <- shapr::explain(
  rf.fit,
  Test1, 
  Train1,
  approach = "gaussian",
  phi0 = p
)

library(ggplot2)
library(ggbeeswarm)
# Plot 
if (requireNamespace("ggplot2", quietly = TRUE)) {
  plot(explanation, plot_type = "scatter")
  plot(explanation, plot_type = "beeswarm")
}


[synthetic_data.csv](https://github.com/user-attachments/files/17777841/synthetic_data.csv)


Thanks! 
Hanneleer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plot - shapr #418

Plot - shapr #418

hanneleer commented Nov 15, 2024

martinju commented Nov 15, 2024

hanneleer commented Nov 15, 2024

Plot - shapr #418

Plot - shapr #418

Comments

hanneleer commented Nov 15, 2024

martinju commented Nov 15, 2024

hanneleer commented Nov 15, 2024