You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For extractors such as the Bert encoding extractors, we stumbled upon the issue of whether/how to return results which are not strictly speaking "features", but which may nonetheless be of interest for the user and not retrievable from the Stim itself.
An example of this is Bert tokens. The ComplexTextStim fed into the Bert extractor is first tokenized into sub-word tokens, then encoded by the Bert model. Here, the high-dimensional encodings returned by the models is what one would properly considered "features". The tokens in which the stimulus is split are not strictly speaking features, but it might be nice to retain them in the result object or even result data frame, as this would enable the user to keep track of what token each embedding encodes.
The ExtractorResult object does not currently handle this kind of non-feature and non-stimulus-attribute information. One potential fix could be adding a field to the ExtractorResult object where this kind of extra information can be stored, so to be a) accessible to the user from the Result object itself; b) retrieved by extractor-specific to_df methods and added to the dataframe.
The text was updated successfully, but these errors were encountered:
For extractors such as the Bert encoding extractors, we stumbled upon the issue of whether/how to return results which are not strictly speaking "features", but which may nonetheless be of interest for the user and not retrievable from the
Stim
itself.An example of this is Bert tokens. The
ComplexTextStim
fed into the Bert extractor is first tokenized into sub-word tokens, then encoded by the Bert model. Here, the high-dimensional encodings returned by the models is what one would properly considered "features". The tokens in which the stimulus is split are not strictly speaking features, but it might be nice to retain them in the result object or even result data frame, as this would enable the user to keep track of what token each embedding encodes.The
ExtractorResult
object does not currently handle this kind of non-feature and non-stimulus-attribute information. One potential fix could be adding a field to the ExtractorResult object where this kind of extra information can be stored, so to be a) accessible to the user from theResult
object itself; b) retrieved by extractor-specificto_df
methods and added to the dataframe.The text was updated successfully, but these errors were encountered: