Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-4260] Support CarbonData on Trino #4198

Open
wants to merge 3 commits into
base: summer-2021
Choose a base branch
from

Conversation

czy006
Copy link
Contributor

@czy006 czy006 commented Aug 8, 2021

Why is this PR needed?
PrestoSQL has now changed its name to Trino. Because Facebook established the Presto Foundation at The Linux Foundation®,Led to prestosql Must be change the name. More information can see here

replace the old PrestoSQL implementation with a new Trino

CarbonData currently supports querying Hive/Spark data using PrestoSQL (316), but many new features and performance improvements are not available to users due to the new version of CarbonData

We will first provide a beta version of CarbonData on the summer-2021 branch. You are welcome to use the test, wait for the completion of the test and use cases, we will officially merge into the trunk branch when the functionality is stable

More information:https://issues.apache.org/jira/browse/CARBONDATA-4260

What changes were proposed in this PR?
Java 11 removed ConstructorAccessor class,use unsafe class for reflection
Rename the file Trino beginning with Presto
Unify file hump case style and Divide component package names
CarbonDataSplitManager add properties serialization.lib about StorageFormatSerDe
TrinoFilterUtil Logical Changes to determine Rang
Maven Pom Support Trino 358

Does this PR introduce any user interface change?
No
Is any new testcase added?
No

@czy006 czy006 changed the title Trino 358 alpha [CARBONDATA-4260] Support CarbonData on Trino Aug 8, 2021
@czy006
Copy link
Contributor Author

czy006 commented Aug 8, 2021

retest this please

@CarbonDataQA2
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5830/

@CarbonDataQA2
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4085/

@CarbonDataQA2
Copy link

Build Failed with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/232/

@czy006 czy006 marked this pull request as ready for review August 10, 2021 11:11
@czy006
Copy link
Contributor Author

czy006 commented Aug 10, 2021

retest this please

@CarbonDataQA2
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4094/

@CarbonDataQA2
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5838/

@CarbonDataQA2
Copy link

Build Failed with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/242/

@czy006
Copy link
Contributor Author

czy006 commented Aug 11, 2021

retest this please

@CarbonDataQA2
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4098/

@CarbonDataQA2
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5842/

@CarbonDataQA2
Copy link

Build Failed with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/246/

@czy006
Copy link
Contributor Author

czy006 commented Aug 11, 2021

Build Failed Reason Report:Jenkins evn don't support jdk11 , just have jdk8. Maybe this reason to build failes.
image

@czy006
Copy link
Contributor Author

czy006 commented Aug 11, 2021

build mvn command : mvn clean install -pl :trino -am -Djacoco.skip=true -Dfindbugs.skip=true -DskipTests=true -Dspark.version=2.4.5 -Dhadoop.version=2.7.7 -Dhive.version=3.1.0 -Dscala.version=2.11.12
maven-version:3.8.1
jdk-version:openjdk-11.0.11
image
image

@brijoobopanna
Copy link
Contributor

@czy006 thanks for your contribution, it would be helpful if you can raise one discussion in community on your approach to integrate Trino

Ex: https://www.mail-archive.com/[email protected]/msg02219.html
https://www.mail-archive.com/[email protected]/msg01835.html
i have requested @ajantha-bhat to review your design approach and PR

@czy006
Copy link
Contributor Author

czy006 commented Aug 11, 2021

@brijoobopanna
Thanks for your reply and recognition 😁
This is first time to contributing code, and i meet some problems during development and contribution.
First, i want to know whether support directive builds when building PR, or JDK11.
In many cases, mismatch of JDK or Maven version make build fail.
I'll attached instructions and design documents later. I will add some instructions and design documents at a later date.
I will develop this feature until replace the current PrestoSQL. Looking forward your next reply.

This graph is my local basic run base test:

@ajantha-bhat
Copy link
Member

@czy006 : Thanks for your interest in this. I have not reviewed PR line by line, But I do have some suggestions

  1. I see that you are updating to trino 358, already 360 has released. Their release cycle is very fast. So, it will be hard to keep up.
    I want you to raise a discussion in dev and user mail list to see what version majority of people want to use it and do we need to support current prestosql 333 and prestodb also ?

  2. Also we can't keep support for older version for long time, code becomes hard to maintain. We already have prestodb , prestosql, now trino folder.

  3. Better to extract the common code (similar to we did for prestodb and prestosql) instead of duplicating all the classes.

  4. Also your integration is high level integration! we need to make changes for carbon to use the new features that comes along with newer version of trino, like dynamic filtering, Rubix cache integration etc.

  5. In the cluster you can test both insert flow as well as query flow to make sure functionalities are not broken, sometime unit testes are not enough (based on my experiences on presto integration)

  6. Regrading your suggestion for adding JDK11 in PR builder, currently it is not possible as other modules of carbondata are not JDK11compatible yet. we get many compilation error when we compile all modules of carbondata. Example, /repo/Carbon/format/target/gen-java/org/apache/carbondata/format/DataChunk.java:[32,24] package javax.annotation does not exist

presto 333 also needs JDK 11 for run time, so what I did is that only compile carbondata-presto jars in JDK11.
For that first compile whole project in JDK8, then export JAVA_HOME path to JDK11 and run the same maven command with -pl integration/presto (also as I mentioned previously in #4184, use spark-2.3 profile)

@czy006
Copy link
Contributor Author

czy006 commented Aug 13, 2021

@ajantha-bhat
Thank you for your opinion.Your suggestion is very helpful to me, a good community should be discussed together. So I created this discussion on the mailing list earlier and welcome us to discuss this feature here. Some thoughts on me will be added here later . Or you can create a discussion for us to join

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants