Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HELP-598 [HUBMAP] Issue with current submission #9

Open
jeremywalter opened this issue Mar 23, 2023 · 3 comments
Open

HELP-598 [HUBMAP] Issue with current submission #9

jeremywalter opened this issue Mar 23, 2023 · 3 comments
Assignees

Comments

@jeremywalter
Copy link
Contributor

From Ivan:

This issue is related to submission 0d9ebdfe-c920-11ed-a325-5376c1e42e63

In this submission we planned to map all of HuBMAP public primary data. But I just noticed in the submission that the reason we are failing is that due to the complexity of the relationship of entities, we have several entities referencing each other.

For example, multiple collections can reference the same sample. Because of this the sample ID appears more than once in the table. We would like to keep the original IDs in the submission since these are generated as part of our ingestion process.

Please advise on how to proceed with this matter.

@jeremywalter
Copy link
Contributor Author

From Karl: This was discussed in the slack c2m2helpdesk channel and the submitter was advised to remove duplicate subject records.

@jeremywalter
Copy link
Contributor Author

jeremywalter commented Mar 23, 2023

From Ivan

I removed the duplicates in the subject.tsv file. I get an error at submission but cannot interpret the error message. Could you share the full error?

  Found 246 errors in datapackage “C2M2_datapackage.json”. First error: There is an extra label “” in header at position “11”

Found the issue myself and fixed it.

When I saved the subject.tsv I kept the index in the TSV which included an extra column.

@karlcz
Copy link

karlcz commented Mar 23, 2023

This is a class of error handled by the frictionless validation. When I run it in the CLI, it does group the error under the specific subject.tsv file that was being checked. However, the way we are getting a validation "report" from the frictionless API, we seem to lose this TSV file context. I've opened nih-cfde/cfde-deriva#396 as a potential enhancement for the submission pipeline, but I don't have a schedule for when this might be looked at.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants