-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plot - shapr #418
Comments
Hi. I agree this does not look good. I thought we had fixed things like this in #406, but maybe this is an edge case. Please provide a complete runnable example, and we will look into it. |
Hi @martinju! Thank you so much for this! I hope this is enough information - please let me now otherwise: Data_O <- read_csv(synthetic_data)
# Remove rows with missing values
Data_O <- Data_O[complete.cases(Data_O),]
# Handle extremes of target
Data_O <- Data_O %>% filter(actief_in_inst_2022_SCH > 0.60)
Data_O$actief_in_inst_2022_SCH <- sqrt(Data_O$actief_in_inst_2022_SCH)
# Features
check <- as.data.frame(model.matrix(~., data = Data_O[, c(3, 32:36, 38, 55:68)]))
check[] <- lapply(check, as.numeric)
check <- as.matrix(check)
check <- check[, -1]
# Outcome variable
y <- as.numeric(Data_O$actief_in_inst_2022_SCH)
# Split dataset into training (70%) and test (30%) sets
samp <- sample(nrow(Data_O), 0.7 * nrow(Data_O))
Train1 <- check[samp, ]
Train1 <- as.data.frame(Train1)
Test1 <- check[-samp, ]
Test1 <- as.data.frame(Test1)
Y_train <- y[samp]
Y_test <- y[-samp]
# Train Random Forest model
rf.fit <- ranger::ranger(Y_train ~ .,
data = Train1,
mtry = 14,
max.depth = 3,
replace = FALSE,
min.node.size = 40,
sample.fraction = 0.8,
respect.unordered.factors = "order",
importance = "permutation")
# SHAPR
p <- mean(Y_train)
library(shapr)
explanation <- shapr::explain(
rf.fit,
Test1,
Train1,
approach = "gaussian",
phi0 = p
)
library(ggplot2)
library(ggbeeswarm)
# Plot
if (requireNamespace("ggplot2", quietly = TRUE)) {
plot(explanation, plot_type = "scatter")
plot(explanation, plot_type = "beeswarm")
}
[synthetic_data.csv](https://github.com/user-attachments/files/17777841/synthetic_data.csv)
Thanks!
Hanneleer |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Dear all,
I attempted to plot the Shapley values using the shapr package, but I encountered an issue. Here is the plot I generated:
Has anyone else experienced a similar issue? I don’t think the plot is displaying correctly, especially with the strange vertical lines. Any advice would be greatly appreciated!
Thanks!
The text was updated successfully, but these errors were encountered: