-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SUPPORT]problem when inserting data to a non-partitioned table created by flink sql via spark sql cli #12013
Comments
btw, i tried a little more, when I create the hudi table as a partitioned table, then i am able to insert from spark sql cli, |
The non-partitioned key generator is right, did you specify the key generator on Spark side? |
@danny0405 I don't think non-partitioned key generator is right here. When I am creating the hudi table using flink sql, I have explicitly specified the key generator as From the spark side, I did nothing but insert one record Basically, what I am doing is to create an hudi table from flink sql and use it from spark sql |
yeah, we do have some set up logic in |
Hi @bithw1 I am able to replicate this issue from spark side as well. CREATE DATABASE IF NOT EXISTS spark_catalog.`default`;
USE spark_catalog.`default`;
DROP TABLE IF EXISTS hudi_table;
CREATE TABLE hudi_table (a int, b int, c int) USING HUDI TBLPROPERTIES (type = 'mor', primaryKey = 'a', preCombineField = 'b', hoodie.datasource.write.keygenerator.class = 'org.apache.hudi.keygen.ComplexAvroKeyGenerator') PARTITIONED BY (c); Exception:
|
After checking the .hoodie/hoodie.properties file, we found that the hoodie.table.keygenerator.class property is set to org.apache.hudi.keygen.SimpleKeyGenerator by default, and it cannot be changed at runtime. To resolve this issue, we need to update the keygenerator class to org.apache.hudi.keygen.ComplexAvroKeyGenerator. hoodie.table.keygenerator.class=org.apache.hudi.keygen.ComplexAvroKeyGenerator DROP TABLE IF EXISTS hudi_table;
CREATE TABLE hudi_table (a int, b int, c int) USING HUDI TBLPROPERTIES (type = 'mor', primaryKey = 'a', preCombineField = 'b', hoodie.datasource.write.keygenerator.class = 'org.apache.hudi.keygen.ComplexAvroKeyGenerator', hoodie.table.keygenerator.class='org.apache.hudi.keygen.ComplexAvroKeyGenerator') PARTITIONED BY (c); insert into hudi_table values (1,1,1),(2,2,2);
select * from hudi_table;
20241008063231261 20241008063231261_1_0 2 c=2 d685c02a-c97f-4147-a5b7-acb476e3e6c6-0_1-64-77_20241008063231261.parquet 2 2 2
20241008063231261 20241008063231261_0_0 1 c=1 85b3ae97-01a4-4b6c-840a-00a5f783f410-0_0-64-76_20241008063231261.parquet 1 1 1 |
Thanks @rangareddy for the clarification and clue. Per your guide, I modify my test case by adding This problem is one of the evidences that hudi configurations are so confusing and redundant. Also,it looks to me that spark and flink are using totally different configurations for almost the same thing, don't they belong to the same hudi project? |
I think there is still bug there. I am not sure whether |
@jonvex Any insights on this ? |
I am using Hudi 0.15.0 and Flink 1.17.1, following are the steps to reproduce the problem:
From the flink-sql cli: do the following sql statements
So far so good, then I run
insert into test_hudi_flink_mor_2 select 3,3, 3
on the spark-sql cli, an error occurred, key exception message is as follows:When I run
show create table test_hudi_flink_mor_2
on spark-sql cli, it shows that keygenerator is using ComplexAvroKeyGenerator,but when i look at the/tmp/test_hudi_flink_mor_2/.hoodie/hoodie.properties
,hoodie.table.keygenerator.class=org.apache.hudi.keygen.NonpartitionedAvroKeyGenerator
, per the exception, it looks to me that there is a bug here...The text was updated successfully, but these errors were encountered: