merge from apache master #7

mayunSaicmotor · 2017-04-13T02:27:05Z

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

Make sure the PR title is formatted like:
[CARBONDATA-<Jira issue #>] Description of pull request
Make sure tests pass via mvn clean verify. (Even better, enable
Travis-CI on your fork and ensure the whole test matrix passes).
Replace <Jira issue #> in the title with the actual Jira issue
number, if there is one.
If this contribution is large, please file an Apache
Individual Contributor License Agreement.

Testing done

 Please provide details on 
 - Whether new unit test cases have been added or why no new tests are required?
 - What manual testing you have done?
 - Any additional information to help reviewers in testing this change.

For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

…V flow Why is this PR needed? They are multiple issues with the Delete segment API: Not using the latest loadmetadatadetails while writing to table status file, thus can remove table status entry of any concurrently loaded Insert In progress/success segment. The code reads the table status file 2 times When in concurrent queries, they both access checkAndReloadSchema for MV on all databases, 2 different queries try to create a file on same location, HDFS takes the lock for one and fails for another, thus failing the query What changes were proposed in this PR? Only reading the table status file once. Using the latest tablestatus to mark the segment Marked for delete, thus no concurrent issues will come Made touchMDT and checkAndReloadSchema methods syncronized, so that only instance can access it at one time. Does this PR introduce any user interface change? No Is any new testcase added? No This closes #4059

…Sync during query Why is this PR needed? Added logs for MV and method to verify if mv is in Sync during query What changes were proposed in this PR? 1. Move MV Enable Check to beginning to avoid transform logical plan 2. Add Logger if exception is occurred during fetching mv schema 3. Check if MV is in Sync and allow Query rewrite 4. Reuse reading LoadMetadetails to get mergedLoadMapping 5. Set NO-Dict Schema types for insert-partition flow - missed from [CARBONDATA-4077] Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4060

…h index server Why is this PR needed? The used asJava converts to java "in place", without copying the whole data to save time and memory and it just simply wraps the scala collection with a class that conforms to the java interface and thus java serializer is not able to serialize it. What changes were proposed in this PR? Converting it to list, so that it is able to serialize a list. Does this PR introduce any user interface change? No Is any new testcase added? No This closes #4061

… have scheme, the default will be local file system, which is not the file system defined by fs.defaultFS Why is this PR needed? Create table with location, if the location doesn't have scheme, the default will be local file system, which is not the file system defined by fs.defaultFS. What changes were proposed in this PR? If the location doesn't have scheme, add the fs.defaultFS scheme to the beginning of the location. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4065

…rift is Set Why is this PR needed? After converting expression to IN Expression for maintable with SI, expression is not processed if ColumnDrift is enabled. Query fails with NPE during resolveFilter. Exception is added in JIRA What changes were proposed in this PR? Process the filter expression after adding implicit expression Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4063

…which leads to memory leak Why is this PR needed? When there are two spark applications, one drop a table, some cache information of this table stay in another application and cannot be removed with any method like "Drop metacache" command. This leads to memory leak. With the passage of time, memory leak will also accumulate which finally leads to driver OOM. Following are the leak points: 1) tableModifiedTimeStore in CarbonFileMetastore; 2) segmentLockMap in BlockletDataMapIndexStore; 3) absoluteTableIdentifierByteMap in SegmentPropertiesAndSchemaHolder; 4) tableInfoMap in CarbonMetadata. What changes were proposed in this PR? Using expiring map to cache the table information in CarbonMetadata and modified time in CarbonFileMetaStore so that stale information will be cleared automatically after the expiration time. Operations in BlockletDataMapIndexStore no need to be locked, remove all the logic related to segmentLockMap. Does this PR introduce any user interface change? New configuration carbon.metacache.expiration.seconds is added. Is any new testcase added? No This closes #4057

… case of concurrent load, compact and clean files operation Why is this PR needed? There were 2 issues in the clean files post event listener: 1. In concurrent cases, while writing entry back to the table status file, wrong path was given, due to which table status file was not updated in the case of SI table. 2. While writing the loadmetadetails to the table status file during concurrent scenarios, we were only writing the unwanted segments and not all the segments, which could make segments stale in the SI table Due to these 2 issues, when selet query is executed on SI table, the tablestatus would have entry for a segment but it's carbondata file would be deleted, thus throwing an IO Exception. 3. Segment ID is null when writing hive table What changes were proposed in this PR? 1.& 2. Added correct table status path as well sending the correct loadmetadatadetails to be updated in the table status file. Now when select query is fired on the SI table, it will not throw carbondata file not found exception 3. set the load model after setup job of committer Does this PR introduce any user interface change? No Is any new testcase added? No This closes #4066

…table after concurrent Load & Compaction operation Why is this PR needed? When Concurrent LOAD and COMPACTION is in progress on main table having SI, SILoadEventListenerForFailedSegments listener is called to repair SI failed segments if any. It will compare SI and main table segment status, if there is a mismatch, then it will add that specific load to failedLoads to be re-loaded again. During Compaction, SI will be updated first and then maintable. So, in some cases, SI segment will be in compacted state and main table will be in SUCCESS state(the compaction can be still in progress or due to some operation failure). SI index repair will add those segments to failedLoads, by checking if segment lock can be acquired. But, if maintable compaction is finished by the time, SI repair comparison is done, then also, it can acquire segment lock and add those load to failedLoad(even though main table load is COMPACTED). After the concurrent operation is finished, some segments of SI are marked as INSERT_IN_PROGRESS. This will lead to inconsistent state between SI and mainTable segments. What changes were proposed in this PR? Acquire compaction lock on maintable(to ensure compaction is not running), and then compare SI and main table load details, to repair SI segments. Does this PR introduce any user interface change? No Is any new testcase added? No (concurrent scenario) This closes #4067

…e in Presto integration Why is this PR needed? FT for following cases has been added. Here store is created by spark and it is read by Presto. update without local-dict delete operations on table minor, major, custom compaction add and delete segments test update with inverted index read with partition columns Filter on partition columns Bloom index test range columns read streaming data Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4031

…der in SDK Why is this PR needed? Currently, SDK pagination reader is not supported for the filter expression and also returning the wrong result after performing IUD operation through SDK. What changes were proposed in this PR? In case of filter present or update/delete operation get the total rows in splits after building the carbon reader else get the row count from the details info of each splits. Handled ArrayIndexOutOfBoundException and return zero in case of rowCountInSplits.size() == 0 Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4068

Why is this PR needed? 1. Block SI creation on binary column. 2. Block alter table drop column directly on SI table. 3. Create table as like should not be allowed for SI tables. 4. Filter with like should not scan SI table. 5. Currently compaction is allowed on SI table. Because of this if only SI table is compacted and running filter query query on main table is causing more data scan of SI table which will causing performance degradation. What changes were proposed in this PR? 1. Blocked SI creation on binary column. 2. Blocked alter table drop column directly on SI table. 3. Handled Create table as like for SI tables. 4. Handled filter with like to not scan SI table. 5. Block the direct compaction on SI table and add FTs for compaction scenario of SI. 6. Added FT for compression and range column on SI table. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4037

Why is this PR needed? In order to support MERGE INTO SQL Command in Carbondata The previous Scala Parser having trouble to parse the complicated Merge Into SQL Command What changes were proposed in this PR? Add an ANTLR parser, and support parse MERGE INTO SQL Command to DataSet Command Does this PR introduce any user interface change? Yes. The PR introduces the MERGE INTO SQL Command. Is any new testcase added? Yes This closes #4032 Co-authored-by: Zhangshunyu <[email protected]>

Why is this PR needed? since version 2.0, carbon supports starting spark ThriftServer with CarbonExtensions. What changes were proposed in this PR? add the document to start spark ThriftServer with CarbonExtensions. Does this PR introduce any user interface change? No Is any new testcase added? No This closes #4077

entry when there is no update/insert data Why is this PR needed? 1. After #3999 when an update happens on the table, a new segment is created for updated data. But when there is no data to update, still the segments are created and the table status has in progress entries for those empty segments. This leads to unnecessary segment dirs and an increase in table status entries. 2. after this, clean files don't clean these empty segments. 3. when the source table do not have data, CTAS will result in same problem mentioned. What changes were proposed in this PR? when the data is not present during update, make the segment as marked for delete so that the clean files take care to delete the segment, for cats already handled, added test cases. This closes #4018

…ry on sort column giving wrong result with IndexServer Why is this PR needed? 1. Create a table and read from sdk written files fails in cluster with java.nio.file.NoSuchFileException: hdfs:/hacluster/user/hive/warehouse/carbon.store/default/sdk. 2. After fixing the above path issue, filter query on sort column gives the wrong result with IndexServer. What changes were proposed in this PR? 1. In getAllDeleteDeltaFiles , used CarbonFiles.listFiles instead of Files.walk to handle custom file types. 2. In PruneWithFilter , isResolvedOnSegment is used in filterResolver step. Have set table and expression on executor side, so indexserver can use this in filterResolver step. This closes #4064

…hancement Why is this PR needed? Spatial index feature optimization of CarbonData What changes were proposed in this PR? 1. Update spatial index encoded algorithm, which can reduce the required properties of creating geo table 2. Enhance geo query UDFs, support querying geo table with polygon list, polyline list, geoId range list. And add some geo transforming util UDFs. 3. Load data (include LOAD and INSERT INTO) allows user to input spatial index, which column will still generated internally when user does not give. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4012

add override

…4314) (#4324)

#117, The longitude is six decimal places and the dimension is five digits. Why is it the same length after conversion?

Co-authored-by: QiangCai <[email protected]>

…a types (#4263) Why is this PR needed? CHAR and VARCHAR as String data types are no longer supported in Carbon. They should be deleted from doc's desc. What changes were proposed in this PR? CHAR and VARCHAR stop appearing as two String data types in doc. Does this PR introduce any user interface change? No Is any new testcase added? No Co-authored-by: tangchuan <[email protected]>

Co-authored-by: QiangCai <[email protected]>

…ndata-2.3.1-rc1

Bumps [pyarrow](https://github.com/apache/arrow) from 0.11.1 to 14.0.1. - [Commits](apache/arrow@apache-arrow-0.11.1...go/v14.0.1) --- updated-dependencies: - dependency-name: pyarrow dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Minor refactor the build docs * Fix review comments * Update build/README.md

Co-authored-by: jacky <[email protected]>

* upgrade thrift version * change to use 0.20.0 --------- Co-authored-by: jacky <[email protected]>

Bumps org.apache.commons:commons-compress from 1.4.1 to 1.26.0. --- updated-dependencies: - dependency-name: org.apache.commons:commons-compress dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* add github action for building * Revert "[WIP] Optimize geo module, the feature seems less be used (#4353)" This reverts commit 29607c3. * Revert "[WIP] Optimize geo module, the feature seems less be used" This reverts commit 71abab0. * cache thrift

hubot deleted the master branch April 28, 2017 17:31

hubot restored the master branch April 28, 2017 22:27

asfgit force-pushed the master branch from 5f6f1a5 to 419ca24 Compare May 25, 2017 03:07

asfgit force-pushed the master branch from 90ce0d4 to 2ee7775 Compare August 18, 2017 07:08

asfgit force-pushed the master branch from b9ae059 to 2be683e Compare September 1, 2017 01:01

asfgit force-pushed the master branch from 1d8e982 to 5a67c98 Compare October 23, 2017 04:51

asfgit force-pushed the master branch from f1463ed to b977f68 Compare November 24, 2017 03:45

asfgit force-pushed the master branch from b0d4de9 to 28c9418 Compare December 20, 2017 02:49

asfgit force-pushed the master branch from 030ef94 to 881ea1e Compare March 14, 2018 04:18

asfgit force-pushed the master branch from 3664156 to 8034949 Compare April 19, 2018 06:41

asfgit force-pushed the master branch from 0fb1e02 to 84102a2 Compare July 13, 2018 01:26

asfgit force-pushed the master branch 2 times, most recently from 23a9e7c to e07df44 Compare September 26, 2018 07:48

asfgit force-pushed the master branch from 7d944be to 04b5256 Compare December 27, 2018 01:11

vikramahuja1001 and others added 16 commits December 21, 2020 16:32

chenliang613 and others added 30 commits October 10, 2023 22:28

Update maven.yml

0cae3d1

Update maven.yml

fd66031

Update maven.yml

38fdb16

fix

ebe4101

optmize code smell in presto module (#4331)

4618808

add override

optimizeCodeSmellInSpark (#4328)

448564a

optimize code smell in streaming module (#4327)

f18846c

[ISSUE-4313] Add @OverRide for override method of processing module (#…

39dd8ce

…4314) (#4324)

Update pom.xml

dd74408

line 117 (#4237)

1e327f2

#117, The longitude is six decimal places and the dimension is five digits. Why is it the same length after conversion?

fix testcase error (#4337)

57de4a3

Co-authored-by: QiangCai <[email protected]>

[ISSUE-4338] Fix checkstyle issue in sdk module (#4339)

4a1b36f

Co-authored-by: QiangCai <[email protected]>

Update pom.xml for release

53d3370

Update pom.xml

64ecd77

Update pom.xml

48f5976

[ISSUE-4342] Fix test case errors (#4343)

7195869

Co-authored-by: QiangCai <[email protected]>

[maven-release-plugin] prepare release carbondata-parent-apache-carbo…

d326118

…ndata-2.3.1-rc1

[maven-release-plugin] prepare for next development iteration

a6e9e37

Create maven-publish.yml

bcc7137

Use HTTPS protocol for the official website (#4348)

c1b0e9c

Minor refactor the build/README.md (#4349)

74e6e93

* Minor refactor the build docs * Fix review comments * Update build/README.md

[WIP] Optimize geo module, the feature seems less be used

71abab0

modify thrift version (#4356)

5ff36b6

Co-authored-by: jacky <[email protected]>

[CARBONDATA-4349] Upgrade thrift version (#4355)

f370d20

* upgrade thrift version * change to use 0.20.0 --------- Co-authored-by: jacky <[email protected]>

fix format and docker version issues

2c78847

[WIP] Optimize geo module, the feature seems less be used (#4353)

29607c3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge from apache master #7

merge from apache master #7

mayunSaicmotor commented Apr 13, 2017

merge from apache master #7

Are you sure you want to change the base?

merge from apache master #7

Conversation

mayunSaicmotor commented Apr 13, 2017