Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLFLOW support for the SilverKite Algorithm #124

Open
ikegwukc opened this issue Jun 27, 2023 · 1 comment
Open

MLFLOW support for the SilverKite Algorithm #124

ikegwukc opened this issue Jun 27, 2023 · 1 comment

Comments

@ikegwukc
Copy link

ikegwukc commented Jun 27, 2023

Hello, I am trying to log the sklearn pipeline in the model attribute to MLFLOW on databricks with the silverkite template. See the code snippet below. When attrempting to log the sklearn.pipeline that is in the model attribute of the "greykite.framework.pipeline.pipeline.ForecastResult" object I recieve an error message stating: NotImplementedError: Sorry, pickling not yet supported. See https://github.com/pydata/patsy/issues/26 if you want to help.

Any idea on how I can log this model with the silverkite template to MLFLOW?

 from collections import defaultdict
 import warnings

 warnings.filterwarnings("ignore")

 import pandas as pd
 import plotly

 from greykite.common.data_loader import DataLoader
 from greykite.framework.templates.autogen.forecast_config import ForecastConfig
 from greykite.framework.templates.autogen.forecast_config import MetadataParam
 from greykite.framework.templates.forecaster import Forecaster
 from greykite.framework.templates.model_templates import ModelTemplateEnum
 from greykite.framework.utils.result_summary import summarize_grid_search_results

 import mlflow 

 # Loads dataset into pandas DataFrame
 dl = DataLoader()
 df = dl.load_peyton_manning()

 # specify dataset information
 metadata = MetadataParam(
     time_col="ts",  # name of the time column ("date" in example above)
     value_col="y",  # name of the value column ("sessions" in example above)
     freq="D"  # "H" for hourly, "D" for daily, "W" for weekly, etc.
               # Any format accepted by `pandas.date_range`
 )

 forecaster = Forecaster()  # Creates forecasts and stores the result
 result = forecaster.run_forecast_config(  # result is also stored as `forecaster.forecast_result`.
     df=df,
     config=ForecastConfig(
         model_template=ModelTemplateEnum.SILVERKITE.name,
         forecast_horizon=365,  # forecasts 365 steps ahead
         coverage=0.95,         # 95% prediction intervals
         metadata_param=metadata
     )
 )

mlflow.sklearn.log_model(result.model, "model")

I'd like to add that if I change if I change the model_template from ModelTemplateEnum.SILVERKITE.name to ModelTemplateEnum.PROPHET.name the code works fine and I am able to log the model and read the model just fine.

Any advice on how to utlize MLFLLOW with the silverkite template?

@ikegwukc ikegwukc changed the title MLFLOW support MLFLOW support for the SilverKite Algorithm Jun 27, 2023
@samuelefiorini
Copy link

This is related to #73.

Long story short, the Silverkite template is tricky to serialize, due to the internal use of patsy. On the other hand, Prophet is pickable.

To log model artifacts to mlflow using the Silverkite template, you can dump them in a local path (using forecaster.dump_forecast_result), then the whole path can be logged to mlflow via mlflow.log_artifact.

Beware that forecaster.dump_forecast_result, as far as I know, does not work on Windows.

More info on model storing and loading are available here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants