At Nixtla we have implemented several libraries to deal with time series data. We often have to apply some transformation over all of the series, which can prove time consuming even for simple operations like performing some kind of scaling.
We've used numba to speed up our expensive computations, however that comes with other issues such as cold starts and more dependencies (LLVM). That's why we developed this library, which implements several operators in C++ to transform time series data (or other kind of data that can be thought of as independent groups), with the possibility to use multithreading to get the best performance possible.
You probably won't need to use this library directly but rather use one of our higher level libraries like mlforecast, which will use this library under the hood. If you're interested on using this library directly (only depends on numpy) you should continue reading.
pip install coreforecast
conda install -c conda-forge coreforecast
The base data structure is the "grouped array" which holds two numpy 1d arrays:
- data: values of the series.
- indptr: series boundaries such that
data[indptr[i] : indptr[i + 1]]
returns thei-th
series. For example, if you have two series of sizes 5 and 10 the indptr would be [0, 5, 15].
import numpy as np
from coreforecast.grouped_array import GroupedArray
data = np.arange(10)
indptr = np.array([0, 3, 10])
ga = GroupedArray(data, indptr)
Once you have this structure you can run any of the provided transformations, for example:
from coreforecast.lag_transforms import ExpandingMean
from coreforecast.scalers import LocalStandardScaler
exp_mean = ExpandingMean(lag=1).transform(ga)
scaler = LocalStandardScaler().fit(ga)
standardized = scaler.transform(ga)
We've also implemented some functions that work on single arrays, you can refer to the following pages: