Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operators in card loader (and in general) can delete and modify metadata field Unitxt rely on. #967

Open
OfirArviv opened this issue Jun 30, 2024 · 0 comments

Comments

@OfirArviv
Copy link
Collaborator

For example, Unitxt relies on the following fields being part of the instance:
{str} 'recipe_metadata'
{str} 'data_classification_policy'

However, we have 2 operators that delete them from the stream:

  • SelectFields() which only keep given fields from the stream (in order to only keep the relevant columns in datasets with many).
  • JoinStreams() which commit these fields to the join operators.

Right now we will add a special handing of these fields in these operators. But this is a more root problem: User can delete fields we are relying on without noticing.

Possible solutions:

  1. The card recipe should be run before any metadata fields are added. This part of the code is the one with the most "editing" of data.
  2. These fields should not be allowed to be edited, unless using a special function, with some sort of mechanism. (And new instances should be forced to add them).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant