-
Notifications
You must be signed in to change notification settings - Fork 704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-4346] Remove list files while query and invalid cache #4287
base: master
Are you sure you want to change the base?
Conversation
Build Failed with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/772/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6386/ |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4641/ |
0415996
to
c63d26c
Compare
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6387/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4642/ |
Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/773/ |
...in/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/carbondata/core/util/BlockletIndexUtil.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/carbondata/core/util/BlockletIndexUtil.java
Show resolved
Hide resolved
c63d26c
to
94c40e9
Compare
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6390/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4645/ |
Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/776/ |
LGTM |
core/src/main/java/org/apache/carbondata/core/util/BlockletIndexUtil.java
Show resolved
Hide resolved
94c40e9
to
6e26954
Compare
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6393/ |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4650/ |
Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/779/ |
retest this please |
6e26954
to
b6175a7
Compare
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6395/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4652/ |
Build Failed with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/781/ |
retest this please |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4653/ |
Build Failed with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/782/ |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6396/ |
LGTM |
retest this please |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6397/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4654/ |
Build Failed with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/783/ |
Build Failed with Spark 2.4.5, Please check CI http://159.138.8.58:12602/job/ApacheCarbon_PR_Builder_2.4.5/4659/ |
Why is this PR needed?
fileNameToMetaInfoMapping
map. On incremental update for partition table, the number of invalid files keep on increasing each time which is causing the degradation increateCarbonDataFileBlockMetaInfoMapping
method.Perform 1st update: adds 900 new carbondata files.
Perform 2nd update (same update query): adds another 900 carbondata files. Now the files added by 1st update are invalid.
Perform query: It does list files. Here, considers invalid files also and adds to
fileNameToMetaInfoMapping
map.The number of invalid files keeps on increasing with each update which is causing the degradation in creating
fileNameToMetaInfoMapping
map.What changes were proposed in this PR?
Instead of listing files, made a change to get the carbon file from the file name and create BlockMetaInfo directly in
createBlockMetaInfo
.Impact when tested on a single partition with 100 segments:
- There is significant improvement observed in the Incremental update operation.
- 95% improvement seen in 1st time
select count(*)
operation. Because inselect count(*)
flow it was listing files for each segment and the map was not reused.Impact when tested on a non-partition table with 100 segments:
- Almost the same or no improvement for the non-partition table
Clearing invalid/deleted segments from cache after delete and update.
Does this PR introduce any user interface change?
Is any new testcase added?