Coordinator behaviour when task.max > 1 and producer uses round-robin strategy due to null keys #290

anmol · 2024-09-11T07:07:16Z

Hi,

I am implementing a CDC pipeline from Oracle which has tables not having explicit primary keys. We are specifying the id columns in sink connector based data awareness(no constraint though) and the sink connector is able to work fine.

However, my concern is that the lack of primary key on source means null keys in Kafka and that the mutations on a source record (multiple Updates) are not guaranteed an ordering in Kafka. (Kafka producer behaviour)
Then if we set task.max>1 in sink connector properties, the Updates on the same records may be processed by different tasks(workers) and in a different order.

Can there be a possibility that this results in an inconsistent behaviour during commit, like update ordering getting changed, due to coordinator committing second update in first batch and first update in subsequent commit?

cc @bryanck

Thanks in Advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coordinator behaviour when task.max > 1 and producer uses round-robin strategy due to null keys #290

Coordinator behaviour when task.max > 1 and producer uses round-robin strategy due to null keys #290

anmol commented Sep 11, 2024 •

edited

Loading

Coordinator behaviour when task.max > 1 and producer uses round-robin strategy due to null keys #290

Coordinator behaviour when task.max > 1 and producer uses round-robin strategy due to null keys #290

Comments

anmol commented Sep 11, 2024 • edited Loading

anmol commented Sep 11, 2024 •

edited

Loading