Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using googleAuthR to authenticate using docker and tidymodels with vetiver #229

Open
jgarrigan opened this issue Sep 29, 2023 · 0 comments

Comments

@jgarrigan
Copy link

jgarrigan commented Sep 29, 2023

What goes wrong

I'm using tidymodels to build a basic ML model, I'm then using the Vetiver package to serve this model as an API endpoint on GCP using a docker container. I'm having issues with authentication the error thrown when I run docker run is that there's "No .httr-oauth file exists in current working directory. Do library authentication steps to provide credentials."

I'm confused as to what is causing the issue, when I run gcs_list_buckets(projectId = Sys.getenv("GCE_DEFAULT_PROJECT_ID")) I can see my bucket info leading me to think I'm authenticated.

image

Are there recommendations when trying to authenticate using docker?

Steps to reproduce the problem

Please note that if a reproduceable example that I can run is not available, then the likelihood of getting any bug fixed is low.

if (!require("pacman")) install.packages("pacman")

pacman::p_load(
tidyverse,
googleCloudRunner,
skimr,
tidymodels,
palmerpenguins,
gt,
ranger,
brulee,
pins,
vetiver,
plumber,
conflicted,
usethis,
themis,
googleCloudStorageR,
googleAuthR,
httr,
gargle,
tune,
finetune,
doMC
)

AUTHENTICATE USING THE SERVICE ACCOUNT JSON FILE REFERENCED IN THE ENVIRON FILE

googleAuthR::gar_auth_service(json_file = Sys.getenv("GCE_AUTH_FILE"))

gcs_list_buckets(projectId = Sys.getenv("GCE_DEFAULT_PROJECT_ID"))

tidymodels_conflicts()

conflict_prefer("penguins", "palmerpenguins")

PREPARE & SPLIT DATA ----------------------------------------------------

REMOVE ROWS WITH MISSING SEX, EXCLUDE YEAR AND ISLAND

penguins_df <-
penguins %>%
drop_na(sex) %>%
select(-year, -island)

set.seed(123)

SPLIT THE DATA INTO TRAIN AND TEST SETS STRATIFIED BY SEX

penguin_split <- initial_split(penguins_df, strata = sex, prop = 3 / 4)
penguin_train <- training(penguin_split)
penguin_test <- testing(penguin_split)

CREATE FOLDS FOR CROSS VALIDATION

penguin_folds <- vfold_cv(penguin_train)

CREATE PREPROCESSING RECIPE ---------------------------------------------

penguin_rec <-
recipe(sex ~ ., data = penguin_train) %>%
step_YeoJohnson(all_numeric_predictors()) %>%
themis::step_upsample(species) %>%
step_dummy(species) %>%
step_normalize(all_numeric_predictors())

MODEL SPECIFICATION -----------------------------------------------------

LOGISTIC REGRESSION

glm_spec <-

L1 REGULARISATION

logistic_reg(penalty = 1) %>%
set_engine("glm")

RANDOM FOREST

tree_spec <-
rand_forest(min_n = tune()) %>%
set_engine("ranger") %>%
set_mode("classification")

NEURAL NETWORK WITH TORCH

mlp_brulee_spec <-
mlp(
hidden_units = tune(),
epochs = tune(),
penalty = tune(),
learn_rate = tune()
) %>%
set_engine("brulee") %>%
set_mode("classification")

MODEL FITTING AND HYPER PARAMETER TUNING --------------------------------

# REGISTER PARALLEL CORES

registerDoMC(cores = 2)

BAYESIAN OPTIMIZATION FOR HYPER PARAMETER TUNING

bayes_control <- control_bayes(
no_improve = 10L,
time_limit = 20,
save_pred = TRUE,
verbose = TRUE
)

FIT ALL THREE MODELS WITH HYPER PARAMETER TUNING

workflow_set <-
workflow_set(
preproc = list(penguin_rec),
models = list(
glm = glm_spec,
tree = tree_spec,
torch = mlp_brulee_spec
)
) %>%
workflow_map("tune_bayes",
iter = 50L,
resamples = penguin_folds,
control = bayes_control
)

COMPARE MODEL RESULTS ---------------------------------------------------

rank_results(workflow_set,
rank_metric = "roc_auc",
select_best = TRUE
) %>%
gt()

PLOT MODEL PERFORMANCE

workflow_set %>%
autoplot()

FINALIZE MODEL FIT ------------------------------------------------------

SELECT THE LOGISTIC MODEL GIVEN THAT ITS A SIMPLER MODEL AND PERFORMANCE

IS SIMILAR TO THE NUERAL NET MODEL

best_model_id <- "recipe_glm"

SELECT BEST MODEL

best_fit <-
workflow_set %>%
extract_workflow_set_result(best_model_id) %>%
select_best(metric = "accuracy")

CREATE WORKFLOW FOR BEST MODEL

final_workflow <-
workflow_set %>%
extract_workflow(best_model_id) %>%
finalize_workflow(best_fit)

final_fit <-
final_workflow %>%
last_fit(penguin_split)

FINAL FIT METRICS

final_fit %>%
collect_metrics() %>%
gt()

final_fit %>%
collect_predictions() %>%
roc_curve(sex, .pred_female) %>%
autoplot()

final_fit_to_deploy <- final_fit %>%
extract_workflow()

VERSION WITH VETIVER ----------------------------------------------------

INITIALISE VETIVER MODEL OBJECT

v <- vetiver_model(final_fit_to_deploy,
model_name = "logistic_regression_model"
)

v

model_board <- board_gcs(bucket = "ml_ops_in_r_bucket")

model_board %>% vetiver_pin_write(vetiver_model = v)

My api is accessible and has response status 200

image

vetiver_write_plumber(model_board, "logistic_regression_model", rsconnect = FALSE)

vetiver_write_docker(v)

My docker file contains environment variable references to my json file as well as the bucket and project

image

Expected output

Actual output

image

Before you run your code, please run:

options(googleAuthR.verbose=2) and copy-paste the console output here.
Check it doesn't include any sensitive info like auth tokens or accountIds - you can usually just edit those out manually and replace with say XXX

Session Info

R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_Ireland.utf8 LC_CTYPE=English_Ireland.utf8
[3] LC_MONETARY=English_Ireland.utf8 LC_NUMERIC=C
[5] LC_TIME=English_Ireland.utf8

time zone: Europe/Dublin
tzcode source: internal

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] rapidoc_8.4.3 doMC_1.3.5
[3] iterators_1.0.14 foreach_1.5.2
[5] finetune_1.1.0 gargle_1.5.2
[7] httr_1.4.7 googleAuthR_2.0.1
[9] googleCloudStorageR_0.7.0 themis_1.0.2
[11] usethis_2.2.2 conflicted_1.2.0
[13] plumber_1.2.1 vetiver_0.2.3
[15] pins_1.2.1 brulee_0.2.0
[17] ranger_0.15.1 gt_0.9.0
[19] palmerpenguins_0.1.1 yardstick_1.2.0
[21] workflowsets_1.0.1 workflows_1.1.3
[23] tune_1.1.2 rsample_1.2.0
[25] recipes_1.0.8 parsnip_1.1.1
[27] modeldata_1.2.0 infer_1.0.4
[29] dials_1.2.0 scales_1.2.1
[31] broom_1.0.5 tidymodels_1.1.1
[33] skimr_2.1.5 googleCloudRunner_0.5.0
[35] lubridate_1.9.2 forcats_1.0.0
[37] stringr_1.5.0 dplyr_1.1.2
[39] purrr_1.0.2 readr_2.1.4
[41] tidyr_1.3.0 tibble_3.2.1
[43] ggplot2_3.4.3 tidyverse_2.0.0
[45] pacman_0.5.1

loaded via a namespace (and not attached):
[1] torch_0.11.0 rstudioapi_0.15.0 jsonlite_1.8.7
[4] magrittr_2.0.3 farver_2.1.1 fs_1.6.3
[7] vctrs_0.6.3 memoise_2.0.1 askpass_1.2.0
[10] base64enc_0.1-3 butcher_0.3.3 htmltools_0.5.6
[13] curl_5.0.2 sass_0.4.7 parallelly_1.36.0
[16] googlePubsubR_0.0.4 cachem_1.0.8 mime_0.12
[19] lifecycle_1.0.3 pkgconfig_2.0.3 Matrix_1.5-4.1
[22] R6_2.5.1 fastmap_1.1.1 future_1.33.0
[25] digest_0.6.33 colorspace_2.1-0 furrr_0.3.1
[28] ps_1.7.5 labeling_0.4.3 fansi_1.0.4
[31] timechange_0.2.0 compiler_4.3.1 bit64_4.0.5
[34] withr_2.5.0 backports_1.4.1 webutils_1.1
[37] MASS_7.3-60 lava_1.7.2.1 openssl_2.1.1
[40] rappdirs_0.3.3 tools_4.3.1 httpuv_1.6.11
[43] zip_2.3.0 future.apply_1.11.0 nnet_7.3-19
[46] glue_1.6.2 callr_3.7.3 promises_1.2.1
[49] grid_4.3.1 generics_0.1.3 gtable_0.3.4
[52] tzdb_0.4.0 class_7.3-22 data.table_1.14.8
[55] hms_1.1.3 xml2_1.3.5 utf8_1.2.3
[58] pillar_1.9.0 later_1.3.1 splines_4.3.1
[61] lhs_1.1.6 lattice_0.21-8 swagger_3.33.1
[64] survival_3.5-5 bit_4.0.5 tidyselect_1.2.0
[67] coro_1.0.3 jose_1.2.0 knitr_1.43
[70] xfun_0.40 hardhat_1.3.0 timeDate_4022.108
[73] stringi_1.7.12 DiceDesign_1.9 yaml_2.3.7
[76] codetools_0.2-19 cli_3.6.1 rpart_4.1.19
[79] bundle_0.1.0 repr_1.1.6 munsell_0.5.0
[82] processx_3.8.2 Rcpp_1.0.11 ROSE_0.0-4
[85] globals_0.16.2 ellipsis_0.3.2 gower_1.0.1
[88] assertthat_0.2.1 GPfit_1.0-8 listenv_0.9.0
[91] ipred_0.9-14 prodlim_2023.08.28 rlang_1.1.1

Please run sessionInfo() so we can check what versions of packages you have installed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant