-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added loaders parameter to run_algorithm() in run_algo.py #2199
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@calmitchell617 thanks for the PR! This looks great, especially for a first PR! I think there's a bug currently in the case where loaders
isn't passed, and I have some ideas about how we might improve the ergonomics of the API.
It looks like the API proposed here is that the user would pass a dictionary mapping BoundColumn
to PipelineLoader
? That makes sense for the case where the user is supplying DataFrameLoader
s (which can only handle one column at a time), but another common case is to have a loader for each of your datasets, which would be a bit awkward to specify in this API.
A user would have to do something like:
def make_column_to_loader_map(my_datasets: List[Dataset],
my_loaders: List[Loader]):
col_to_loader = {}
for dataset, loader in zip(my_datasets, my_loaders):
col_to_loader.update(dict.fromkeys(dataset.columns, loader))
return col_to_loader
A few ideas on how we might improve this interface:
- We could allow passing a dictionary that either maps columns or datasets to loaders, in which case our internal function might turn into something like:
us_price_loader = USEquityPricingLoader(
bundle_data.equity_daily_bar_reader,
bundle_data.adjustment_reader,
)
if user_supplied_loaders is None:
user_supplied_loaders = {USEquityPricing: us_price_loader}
else:
# Allow the user to provide their own USEquityPricing loader if they pass one explicitly.
user_supplied_loaders.setdefault(USEquityPricing, us_price_loader)
def choose_loader(column):
if column in user_supplied_loaders:
return loaders[column]
elif column.dataset in user_supplied_loaders:
return loaders[column.dataset]
else:
raise ValueError(...)
- We could keep this interface the same, but provide a helper function to convert a map from dataset -> loader into a map from column -> loader.
- We could keep the current interface that requires a function from column -> loader, but provide a helper class for implementing common patterns, e.g.:
class PipelineDispatcher(object):
"""Helper class for building a dispatching function for a PipelineLoader.
Parameters
----------
column_loaders : dict[BoundColumn -> PipelineLoader]
Map from columns to pipeline loader for those columns.
dataset_loaders : dict[DataSet -> PipelineLoader]
Map from datasets to pipeline loader for those datasets.
"""
def __init__(self, column_loaders, dataset_loaders):
self._column_loaders = column_loaders
self._dataset_loaders = dataset_loaders
def __call__(self, column):
if column in self._column_loaders:
return self._column_loaders[column]
elif column.dataset in self._dataset_loaders:
return self._dataset_loaders[column.dataset]
else:
raise LookupError("No pipeline loader registered for %s", column)
I think that last model would probably be the easiest one to extend to support registering datasets/loaders from a zipline extension (e.g., we could keep a global default instance of PipelineDispatcher
) and provide hooks to register custom loaders/datasets, which internally would update that global default instance.
cc @richafrank @llllllllll for general API design thoughts here, and cc @jnazaren since this is relevant to your work on improving the extensibility of Zipline.
zipline/utils/run_algo.py
Outdated
raise ValueError( | ||
"No PipelineLoader registered for column %s." % column | ||
) | ||
elif column in loaders: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will crash in the current implementation if the user doesn't supply any custom loaders (because loaders
will be None, so column in loaders
will barf).
@ssanderson, I changed some language to show that this argument is used for loading dataframes, and changed the
I will try to implement your suggestions. Just playing devil's advocate, can you think of any cases where it would be better to have separate, optional arguments for different types of loaders, in the |
@@ -71,7 +71,8 @@ def _run(handle_data, | |||
print_algo, | |||
metrics_set, | |||
local_namespace, | |||
environ): | |||
environ, | |||
data_frame_loaders): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is anything special about this being a dataframe loader, right?
@@ -344,6 +349,9 @@ def run_algorithm(start, | |||
environ : mapping[str -> str], optional | |||
The os environment to use. Many extensions use this to get parameters. | |||
This defaults to ``os.environ``. | |||
loaders : iterable{PipelineLoader}, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the docstring and parameter name don't match here.
I've created PR #2246 with one of the solutions suggested above |
Added an optional parameter to run_algorithm() that takes a dictionary of PipelineLoader objects. Allows you to load Columns of data that aren't from USEquityLoader while running a backtest.
This is my first pull request on any meaningful project, constructive criticism is very welcome.