examples/grocery_sales/R/02_basic_models.Rmd

---
title: Basic models
output: html_notebook
---

_Copyright (c) Microsoft Corporation._<br/>
_Licensed under the MIT License._

```{r, echo=FALSE, results="hide", message=FALSE}
library(tidyr)
library(dplyr)
library(tsibble)
library(feasts)
library(fable)
library(urca)
```

We fit some simple models to the orange juice data for illustrative purposes. Here, each model is actually a _group_ of models, one for each combination of store and brand. This is the standard approach taken in statistical forecasting, and is supported out-of-the-box by the tidyverts framework.

- `mean`: This is just a simple mean.
- `naive`: A random walk model without any other components. This amounts to setting all forecast values to the last observed value.
- `drift`: This adjusts the `naive` model to incorporate a straight-line trend.
- `arima`: An ARIMA model with the parameter values estimated from the data.

Note that the model training process is embarrassingly parallel on 3 levels:

- We have multiple independent training datasets;
- For which we fit multiple independent models;
- Within which we have independent sub-models for each store and brand.

This lets us speed up the training significantly. While the `fable::model` function can fit multiple models in parallel, we will run it sequentially here and instead parallelise by dataset. This avoids contention for cores, and also results in the simplest code. As a guard against returning invalid results, we also specify the argument `.safely=FALSE`; this forces `model` to throw an error if a model algorithm fails.

```{r}
srcdir <- here::here("R_utils")
for(src in dir(srcdir, full.names=TRUE)) source(src)

load_objects("grocery_sales", "data.Rdata")

cl <- make_cluster(libs=c("tidyr", "dplyr", "fable", "tsibble", "feasts"))

oj_modelset_basic <- parallel::parLapply(cl, oj_train, function(df)
{
    model(df,
        mean=MEAN(logmove),
        naive=NAIVE(logmove),
        drift=RW(logmove ~ drift()),
        arima=ARIMA(logmove ~ pdq() + PDQ(0, 0, 0)),
        .safely=FALSE
    )
})
oj_fcast_basic <- parallel::clusterMap(cl, get_forecasts, oj_modelset_basic, oj_test)

save_objects(oj_modelset_basic, oj_fcast_basic,
             example="grocery_sales", file="model_basic.Rdata")

do.call(rbind, oj_fcast_basic) %>%
    mutate_at(-(1:3), exp) %>%
    eval_forecasts()
```

The ARIMA model does the best of the simple models, but not any better than a simple mean.

Having fit some basic models, we can also try an exponential smoothing model, fit using the `ETS` function. Unlike the others, `ETS` does not currently support time series with missing values; we therefore have to use one of the other models to impute missing values first via the `interpolate` function.

```{r}
oj_modelset_ets <- parallel::clusterMap(cl, function(df, basicmod)
{
    df %>%
        interpolate(object=select(basicmod, -c(mean, naive, drift))) %>%
        model(
            ets=ETS(logmove ~ error("A") + trend("A") + season("N")),
            .safely=FALSE
        )
}, oj_train, oj_modelset_basic)

oj_fcast_ets <- parallel::clusterMap(cl, get_forecasts, oj_modelset_ets, oj_test)

destroy_cluster(cl)

save_objects(oj_modelset_ets, oj_fcast_ets,
             example="grocery_sales", file="model_ets.Rdata")

do.call(rbind, oj_fcast_ets) %>%
    mutate_at(-(1:3), exp) %>%
    eval_forecasts()
```

The ETS model does _worse_ than the ARIMA model, something that should not be a surprise given the lack of strong seasonality and trend in this dataset. We conclude that any simple univariate approach is unlikely to do well.