using googleAuthR to authenticate using docker and tidymodels with vetiver #229

jgarrigan · 2023-09-29T22:51:36Z

What goes wrong

I'm using tidymodels to build a basic ML model, I'm then using the Vetiver package to serve this model as an API endpoint on GCP using a docker container. I'm having issues with authentication the error thrown when I run docker run is that there's "No .httr-oauth file exists in current working directory. Do library authentication steps to provide credentials."

I'm confused as to what is causing the issue, when I run gcs_list_buckets(projectId = Sys.getenv("GCE_DEFAULT_PROJECT_ID")) I can see my bucket info leading me to think I'm authenticated.

Are there recommendations when trying to authenticate using docker?

Steps to reproduce the problem

Please note that if a reproduceable example that I can run is not available, then the likelihood of getting any bug fixed is low.

if (!require("pacman")) install.packages("pacman")

pacman::p_load(
tidyverse,
googleCloudRunner,
skimr,
tidymodels,
palmerpenguins,
gt,
ranger,
brulee,
pins,
vetiver,
plumber,
conflicted,
usethis,
themis,
googleCloudStorageR,
googleAuthR,
httr,
gargle,
tune,
finetune,
doMC
)

AUTHENTICATE USING THE SERVICE ACCOUNT JSON FILE REFERENCED IN THE ENVIRON FILE

googleAuthR::gar_auth_service(json_file = Sys.getenv("GCE_AUTH_FILE"))

gcs_list_buckets(projectId = Sys.getenv("GCE_DEFAULT_PROJECT_ID"))

tidymodels_conflicts()

conflict_prefer("penguins", "palmerpenguins")

PREPARE & SPLIT DATA ----------------------------------------------------

REMOVE ROWS WITH MISSING SEX, EXCLUDE YEAR AND ISLAND

penguins_df <-
penguins %>%
drop_na(sex) %>%
select(-year, -island)

set.seed(123)

SPLIT THE DATA INTO TRAIN AND TEST SETS STRATIFIED BY SEX

penguin_split <- initial_split(penguins_df, strata = sex, prop = 3 / 4)
penguin_train <- training(penguin_split)
penguin_test <- testing(penguin_split)

CREATE FOLDS FOR CROSS VALIDATION

penguin_folds <- vfold_cv(penguin_train)

CREATE PREPROCESSING RECIPE ---------------------------------------------

penguin_rec <-
recipe(sex ~ ., data = penguin_train) %>%
step_YeoJohnson(all_numeric_predictors()) %>%
themis::step_upsample(species) %>%
step_dummy(species) %>%
step_normalize(all_numeric_predictors())

MODEL SPECIFICATION -----------------------------------------------------

LOGISTIC REGRESSION

glm_spec <-

L1 REGULARISATION

logistic_reg(penalty = 1) %>%
set_engine("glm")

RANDOM FOREST

tree_spec <-
rand_forest(min_n = tune()) %>%
set_engine("ranger") %>%
set_mode("classification")

MODEL FITTING AND HYPER PARAMETER TUNING --------------------------------

# REGISTER PARALLEL CORES

registerDoMC(cores = 2)

BAYESIAN OPTIMIZATION FOR HYPER PARAMETER TUNING

bayes_control <- control_bayes(
no_improve = 10L,
time_limit = 20,
save_pred = TRUE,
verbose = TRUE
)

FIT ALL THREE MODELS WITH HYPER PARAMETER TUNING

workflow_set <-
workflow_set(
preproc = list(penguin_rec),
models = list(
glm = glm_spec,
tree = tree_spec,
torch = mlp_brulee_spec
)
) %>%
workflow_map("tune_bayes",
iter = 50L,
resamples = penguin_folds,
control = bayes_control
)

COMPARE MODEL RESULTS ---------------------------------------------------

rank_results(workflow_set,
rank_metric = "roc_auc",
select_best = TRUE
) %>%
gt()

PLOT MODEL PERFORMANCE

workflow_set %>%
autoplot()

FINALIZE MODEL FIT ------------------------------------------------------

SELECT THE LOGISTIC MODEL GIVEN THAT ITS A SIMPLER MODEL AND PERFORMANCE

IS SIMILAR TO THE NUERAL NET MODEL

best_model_id <- "recipe_glm"

SELECT BEST MODEL

best_fit <-
workflow_set %>%
extract_workflow_set_result(best_model_id) %>%
select_best(metric = "accuracy")

CREATE WORKFLOW FOR BEST MODEL

final_workflow <-
workflow_set %>%
extract_workflow(best_model_id) %>%
finalize_workflow(best_fit)

final_fit <-
final_workflow %>%
last_fit(penguin_split)

FINAL FIT METRICS

final_fit %>%
collect_metrics() %>%
gt()

final_fit %>%
collect_predictions() %>%
roc_curve(sex, .pred_female) %>%
autoplot()

final_fit_to_deploy <- final_fit %>%
extract_workflow()

VERSION WITH VETIVER ----------------------------------------------------

INITIALISE VETIVER MODEL OBJECT

v <- vetiver_model(final_fit_to_deploy,
model_name = "logistic_regression_model"
)

v

model_board <- board_gcs(bucket = "ml_ops_in_r_bucket")

model_board %>% vetiver_pin_write(vetiver_model = v)

My api is accessible and has response status 200

vetiver_write_plumber(model_board, "logistic_regression_model", rsconnect = FALSE)

vetiver_write_docker(v)

My docker file contains environment variable references to my json file as well as the bucket and project

Expected output

Actual output

Before you run your code, please run:

options(googleAuthR.verbose=2) and copy-paste the console output here.
Check it doesn't include any sensitive info like auth tokens or accountIds - you can usually just edit those out manually and replace with say XXX

Session Info

R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_Ireland.utf8 LC_CTYPE=English_Ireland.utf8
[3] LC_MONETARY=English_Ireland.utf8 LC_NUMERIC=C
[5] LC_TIME=English_Ireland.utf8

time zone: Europe/Dublin
tzcode source: internal

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] rapidoc_8.4.3 doMC_1.3.5
[3] iterators_1.0.14 foreach_1.5.2
[5] finetune_1.1.0 gargle_1.5.2
[7] httr_1.4.7 googleAuthR_2.0.1
[9] googleCloudStorageR_0.7.0 themis_1.0.2
[11] usethis_2.2.2 conflicted_1.2.0
[13] plumber_1.2.1 vetiver_0.2.3
[15] pins_1.2.1 brulee_0.2.0
[17] ranger_0.15.1 gt_0.9.0
[19] palmerpenguins_0.1.1 yardstick_1.2.0
[21] workflowsets_1.0.1 workflows_1.1.3
[23] tune_1.1.2 rsample_1.2.0
[25] recipes_1.0.8 parsnip_1.1.1
[27] modeldata_1.2.0 infer_1.0.4
[29] dials_1.2.0 scales_1.2.1
[31] broom_1.0.5 tidymodels_1.1.1
[33] skimr_2.1.5 googleCloudRunner_0.5.0
[35] lubridate_1.9.2 forcats_1.0.0
[37] stringr_1.5.0 dplyr_1.1.2
[39] purrr_1.0.2 readr_2.1.4
[41] tidyr_1.3.0 tibble_3.2.1
[43] ggplot2_3.4.3 tidyverse_2.0.0
[45] pacman_0.5.1

loaded via a namespace (and not attached):
[1] torch_0.11.0 rstudioapi_0.15.0 jsonlite_1.8.7
[4] magrittr_2.0.3 farver_2.1.1 fs_1.6.3
[7] vctrs_0.6.3 memoise_2.0.1 askpass_1.2.0
[10] base64enc_0.1-3 butcher_0.3.3 htmltools_0.5.6
[13] curl_5.0.2 sass_0.4.7 parallelly_1.36.0
[16] googlePubsubR_0.0.4 cachem_1.0.8 mime_0.12
[19] lifecycle_1.0.3 pkgconfig_2.0.3 Matrix_1.5-4.1
[22] R6_2.5.1 fastmap_1.1.1 future_1.33.0
[25] digest_0.6.33 colorspace_2.1-0 furrr_0.3.1
[28] ps_1.7.5 labeling_0.4.3 fansi_1.0.4
[31] timechange_0.2.0 compiler_4.3.1 bit64_4.0.5
[34] withr_2.5.0 backports_1.4.1 webutils_1.1
[37] MASS_7.3-60 lava_1.7.2.1 openssl_2.1.1
[40] rappdirs_0.3.3 tools_4.3.1 httpuv_1.6.11
[43] zip_2.3.0 future.apply_1.11.0 nnet_7.3-19
[46] glue_1.6.2 callr_3.7.3 promises_1.2.1
[49] grid_4.3.1 generics_0.1.3 gtable_0.3.4
[52] tzdb_0.4.0 class_7.3-22 data.table_1.14.8
[55] hms_1.1.3 xml2_1.3.5 utf8_1.2.3
[58] pillar_1.9.0 later_1.3.1 splines_4.3.1
[61] lhs_1.1.6 lattice_0.21-8 swagger_3.33.1
[64] survival_3.5-5 bit_4.0.5 tidyselect_1.2.0
[67] coro_1.0.3 jose_1.2.0 knitr_1.43
[70] xfun_0.40 hardhat_1.3.0 timeDate_4022.108
[73] stringi_1.7.12 DiceDesign_1.9 yaml_2.3.7
[76] codetools_0.2-19 cli_3.6.1 rpart_4.1.19
[79] bundle_0.1.0 repr_1.1.6 munsell_0.5.0
[82] processx_3.8.2 Rcpp_1.0.11 ROSE_0.0-4
[85] globals_0.16.2 ellipsis_0.3.2 gower_1.0.1
[88] assertthat_0.2.1 GPfit_1.0-8 listenv_0.9.0
[91] ipred_0.9-14 prodlim_2023.08.28 rlang_1.1.1

Please run sessionInfo() so we can check what versions of packages you have installed

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using googleAuthR to authenticate using docker and tidymodels with vetiver #229

using googleAuthR to authenticate using docker and tidymodels with vetiver #229

jgarrigan commented Sep 29, 2023 •

edited

Loading

using googleAuthR to authenticate using docker and tidymodels with vetiver #229

using googleAuthR to authenticate using docker and tidymodels with vetiver #229

Comments

jgarrigan commented Sep 29, 2023 • edited Loading

What goes wrong

Steps to reproduce the problem

AUTHENTICATE USING THE SERVICE ACCOUNT JSON FILE REFERENCED IN THE ENVIRON FILE

tidymodels_conflicts()

PREPARE & SPLIT DATA ----------------------------------------------------

REMOVE ROWS WITH MISSING SEX, EXCLUDE YEAR AND ISLAND

SPLIT THE DATA INTO TRAIN AND TEST SETS STRATIFIED BY SEX

CREATE FOLDS FOR CROSS VALIDATION

CREATE PREPROCESSING RECIPE ---------------------------------------------

MODEL SPECIFICATION -----------------------------------------------------

LOGISTIC REGRESSION

L1 REGULARISATION

RANDOM FOREST

NEURAL NETWORK WITH TORCH

MODEL FITTING AND HYPER PARAMETER TUNING --------------------------------

# REGISTER PARALLEL CORES

registerDoMC(cores = 2)

BAYESIAN OPTIMIZATION FOR HYPER PARAMETER TUNING

FIT ALL THREE MODELS WITH HYPER PARAMETER TUNING

COMPARE MODEL RESULTS ---------------------------------------------------

PLOT MODEL PERFORMANCE

FINALIZE MODEL FIT ------------------------------------------------------

SELECT THE LOGISTIC MODEL GIVEN THAT ITS A SIMPLER MODEL AND PERFORMANCE

IS SIMILAR TO THE NUERAL NET MODEL

SELECT BEST MODEL

CREATE WORKFLOW FOR BEST MODEL

FINAL FIT METRICS

VERSION WITH VETIVER ----------------------------------------------------

INITIALISE VETIVER MODEL OBJECT

Expected output

Actual output

Session Info

jgarrigan commented Sep 29, 2023 •

edited

Loading