Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: report missing "parent" in schema yaml #1839

Closed
boxydog opened this issue Sep 18, 2024 · 2 comments · Fixed by #1854
Closed

Feature: report missing "parent" in schema yaml #1839

boxydog opened this issue Sep 18, 2024 · 2 comments · Fixed by #1854
Assignees
Labels
question Further information is requested

Comments

@boxydog
Copy link

boxydog commented Sep 18, 2024

Feature description

When I run the attached code, I get an error:

Job for a_test__sub_record.1d2b10ead9.insert_values failed terminally in load 1726687518.354422 with message null value in column "_dlt_parent_id" of relation "a_test__sub_record" violates not-null constraint
DETAIL: Failing row contains (a, PztIkGSe+oI/8Q, null, null, 2KkBaSEHNC3a/A).
. The package is aborted and cannot be retried.

The reason is that the "parent:" field in the schema yaml is missing.

Part of dlt knows that "sub_record" is a nested table (because it generates a table name a_test__sub_record with two underscores), but another part (I think it's schema.utils.is_nested_table) doesn't know it's a nested table, because the "parent" clause is missing in the schema.

Unpack with "tar xvfz test_files.tgz": test_files.tgz

Are you a dlt user?

Yes, I'm already a dlt user.

Use case

My use case: I copy schema from the export schema to the import schema to control types, but I forgot to copy one line.

It took me a couple of hours to track this down, by stepping through the loading code in a debugger. It could be easier.

This is a "quality of life" issue. I've found dlt is working well, but if I do anything wrong, it can be hard to figure out what's happening.

Proposed solution

The behavior I'd like is: dlt stops earlier (say, during schema parsing, or extract, instead of load), and says "a_test__sub_record is a nested table, but missing a parent declaration in the schema".

Related issues

No response

@rudolfix
Copy link
Collaborator

@boxydog in 1.0 you can dispatch data from nested structures to "root" tables in such a way that dlt will not add its standard linking. this is what happens above. we are still working to make this behavior easy to declare (#1713 and #1647)

what we could do is that if we see a table with "parent_key" (_dlt_parent_id by default) but without "parent" table hint we fail the normalization (a good idea actually)

did you manipulate import schema yourself to get rid of parent? if not we have a bug somewhere and this is way more serious. also please note that lack of "parent" will merge a_test__sub_record in a separate job, probably switching to append since table definition lacks primary key

@rudolfix rudolfix self-assigned this Sep 19, 2024
@rudolfix rudolfix added the question Further information is requested label Sep 19, 2024
@boxydog
Copy link
Author

boxydog commented Sep 19, 2024

what we could do is that if we see a table with "parent_key" (_dlt_parent_id by default) but without "parent" table hint we fail the normalization (a good idea actually)

Yes please.

did you manipulate import schema yourself to get rid of parent?

Yes.

I copy schema from the export schema to the import schema to control types, but I forgot to copy one line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants