Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle "non-feature" results from Extractors #386

Open
rbroc opened this issue Mar 19, 2020 · 0 comments
Open

How to handle "non-feature" results from Extractors #386

rbroc opened this issue Mar 19, 2020 · 0 comments
Labels

Comments

@rbroc
Copy link
Collaborator

rbroc commented Mar 19, 2020

For extractors such as the Bert encoding extractors, we stumbled upon the issue of whether/how to return results which are not strictly speaking "features", but which may nonetheless be of interest for the user and not retrievable from the Stim itself.

An example of this is Bert tokens. The ComplexTextStim fed into the Bert extractor is first tokenized into sub-word tokens, then encoded by the Bert model. Here, the high-dimensional encodings returned by the models is what one would properly considered "features". The tokens in which the stimulus is split are not strictly speaking features, but it might be nice to retain them in the result object or even result data frame, as this would enable the user to keep track of what token each embedding encodes.

The ExtractorResult object does not currently handle this kind of non-feature and non-stimulus-attribute information. One potential fix could be adding a field to the ExtractorResult object where this kind of extra information can be stored, so to be a) accessible to the user from the Result object itself; b) retrieved by extractor-specific to_df methods and added to the dataframe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants