Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Multiple Hub-Records in a Source Over the Same Row #236

Open
MartinHofpower opened this issue Jun 4, 2024 · 0 comments
Open
Assignees
Labels
feature This is is requesting a new feature

Comments

@MartinHofpower
Copy link

MartinHofpower commented Jun 4, 2024

Basic Requirement
I want to create a link that refers to the same hub twice. For the link itself this basically works well via different qualifiers for the Hub-Hashkey that is derived in the staging view. The problem is that the referenced Hashkeys need to be inserted in the Hub as well but as far as I see it this is not possible for different columns from one source at the same time.

Example
Let's assume we have one source src_customer_connection with the following columns:

  • ID
  • CustomerNumber_ParentRecord
  • CustomerNumber_ChildRecord
    (representing any relation between the two customers)

When we load a link CustomerToCustomer_L it is easy to derive hashkeys for both customers and get something like Customer_HK_ParentRecord and Customer_HK_ChildRecord in the link. But these hashkeys should also be present in the Hub Customer_H itself.
Let's say we declare a hub Customer_H with just CustomerNumber as natural business key. Then we can derive the Customer_H_HK either from CustomerNumber_ParentRecord or from CustomerNumber_ChildRecord in the staging model for src_customer_connection but not from both.
If additional data (from additional source files) for the parent and child customer records are not present at the moment the src_customer_connection is loaded we get referential problems since the Customer_HK_*** in the link might reference to non existing hub-records.

Describe the solution you'd like
The best solution would be to insert the necessary records from the link into the hub automatically in the background but I understand that this is not how automate_dv is designed.
What would fit into the framework from my point of view would be the possibility to somehow declare that one row of a source file could contain multiple hub entries. I think the precise metadata-definition is subject to discussion but this option would solve the problem implicitly.

Describe alternatives you've considered
A workaround that I see right now is the declaration of separate staging views for the same source. This gives the possibility to declare the exact names for the hub based on different source column names.

Additional context
This is a problem that came up in one of my projects and I didn't find it in any other issue. We are using automate_dv on sql-server.

AB#5439

@MartinHofpower MartinHofpower added the feature This is is requesting a new feature label Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature This is is requesting a new feature
Projects
None yet
Development

No branches or pull requests

2 participants