How to handle "non-feature" results from Extractors #386

rbroc · 2020-03-19T10:35:57Z

For extractors such as the Bert encoding extractors, we stumbled upon the issue of whether/how to return results which are not strictly speaking "features", but which may nonetheless be of interest for the user and not retrievable from the Stim itself.

An example of this is Bert tokens. The ComplexTextStim fed into the Bert extractor is first tokenized into sub-word tokens, then encoded by the Bert model. Here, the high-dimensional encodings returned by the models is what one would properly considered "features". The tokens in which the stimulus is split are not strictly speaking features, but it might be nice to retain them in the result object or even result data frame, as this would enable the user to keep track of what token each embedding encodes.

The ExtractorResult object does not currently handle this kind of non-feature and non-stimulus-attribute information. One potential fix could be adding a field to the ExtractorResult object where this kind of extra information can be stored, so to be a) accessible to the user from the Result object itself; b) retrieved by extractor-specific to_df methods and added to the dataframe.

The text was updated successfully, but these errors were encountered:

adelavega added the question label Jul 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle "non-feature" results from Extractors #386

How to handle "non-feature" results from Extractors #386

rbroc commented Mar 19, 2020

How to handle "non-feature" results from Extractors #386

How to handle "non-feature" results from Extractors #386

Comments

rbroc commented Mar 19, 2020