Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PreprocessingPipeline #3438

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

chrishalcrow
Copy link
Collaborator

A proposal to add a PreprocessingPipeline class, which contains ordered preprocessing steps and their kwargs in a dictionary.

You can apply the class to a recording, or use the helper function create_preprocessed to make a preprocessed recording:

preprocessor_dict = {'bandpass_filter': {'freq_max': 3000}, 'common_reference': {}}

# apply using
from spikeinterface.preprocessing import PreprocessingPipeline
pipeline = PreprocessingPipeline(preprocessor_dict)
preprocessed_recording = pipeline.apply_to(recording)

# or
from spikeinterface.preprocessing import create_preprocessed
preprocessed_recording = create_preprocessed(recording, preprocessor_dict)

Also adds a function which takes in a recording.json provenance file and make a preprocessor_dict:

from spikeinterface.preprocessing import get_preprocessing_dict_from_json
my_dict = get_preprocessing_dict_from_json('/path/to/recording.json')

This allow for some cool things:

  1. Users can pass a single dictionary to construct a preprocessed recording (as above). Hence it completes the “dictionary workflow”; since you can use dicts in sorting, run_sorter_jobs, and postprocessing in compute.
  2. Users can easily visualise their preprocessing pipeline using the repr, including an HTML repr in Jupyter notebook (I made a hideous one, but we can aim for something like the sklearn pipeline repr see https://scikit-learn.org/stable/auto_examples/miscellaneous/plot_pipeline_display.html)
  3. Increases portability between labs, since you can reconstruct the preprocessing steps from the recording.json file without the original recording (and worrying about paths).

Note that 3. only works for preprocessing steps that are in some sense “global” i.e. can be applied to any recording. This doesn’t apply for all preprocessing steps e.g. interpolate_bad_channels needs the bad_unit_ids which are recording dependent. However, many of these functions can be modified to be applied more globally e.g. if bad_unit_ids is None, interpolate_bad_channels could detect bad channels, then interpolate these. This would be apply-able to any recording, so is “global”.

No rush on this and I’m not 100% set on it being implemented. Important to get the names right. I read this: https://melevir.medium.com/python-functions-naming-tips-376f12549f9. I think it’s important that create_preprocessed doesn’t sound in-place, after the number of problems with set_probe. Hence I’m against something like apply_preprocessing(recording), and would rather have make, create, construct, produce or something in the function name. I also like the idea (from the article) that you don’t need to include e.g. recording in the name if recording is a required argument. Hence I like something like my_pipeline.apply_to(recording) rather than something like my_pipeline.apply_pipeline_to_recording(recording).

To do:

  • Tests
  • Add "allowed preprocessing steps" for get_preprocessing_dict_from_json

@chrishalcrow chrishalcrow added enhancement New feature or request preprocessing Related to preprocessing module labels Sep 25, 2024
@alejoe91 alejoe91 modified the milestone: 0.101.2 Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request preprocessing Related to preprocessing module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants