forked from apache/carbondata
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge from apache master #7
Open
mayunSaicmotor
wants to merge
3,218
commits into
mayunSaicmotor:master
Choose a base branch
from
apache:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
asfgit
force-pushed
the
master
branch
2 times, most recently
from
September 26, 2018 07:48
23a9e7c
to
e07df44
Compare
…V flow Why is this PR needed? They are multiple issues with the Delete segment API: Not using the latest loadmetadatadetails while writing to table status file, thus can remove table status entry of any concurrently loaded Insert In progress/success segment. The code reads the table status file 2 times When in concurrent queries, they both access checkAndReloadSchema for MV on all databases, 2 different queries try to create a file on same location, HDFS takes the lock for one and fails for another, thus failing the query What changes were proposed in this PR? Only reading the table status file once. Using the latest tablestatus to mark the segment Marked for delete, thus no concurrent issues will come Made touchMDT and checkAndReloadSchema methods syncronized, so that only instance can access it at one time. Does this PR introduce any user interface change? No Is any new testcase added? No This closes #4059
…Sync during query Why is this PR needed? Added logs for MV and method to verify if mv is in Sync during query What changes were proposed in this PR? 1. Move MV Enable Check to beginning to avoid transform logical plan 2. Add Logger if exception is occurred during fetching mv schema 3. Check if MV is in Sync and allow Query rewrite 4. Reuse reading LoadMetadetails to get mergedLoadMapping 5. Set NO-Dict Schema types for insert-partition flow - missed from [CARBONDATA-4077] Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4060
…h index server Why is this PR needed? The used asJava converts to java "in place", without copying the whole data to save time and memory and it just simply wraps the scala collection with a class that conforms to the java interface and thus java serializer is not able to serialize it. What changes were proposed in this PR? Converting it to list, so that it is able to serialize a list. Does this PR introduce any user interface change? No Is any new testcase added? No This closes #4061
… have scheme, the default will be local file system, which is not the file system defined by fs.defaultFS Why is this PR needed? Create table with location, if the location doesn't have scheme, the default will be local file system, which is not the file system defined by fs.defaultFS. What changes were proposed in this PR? If the location doesn't have scheme, add the fs.defaultFS scheme to the beginning of the location. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4065
…rift is Set Why is this PR needed? After converting expression to IN Expression for maintable with SI, expression is not processed if ColumnDrift is enabled. Query fails with NPE during resolveFilter. Exception is added in JIRA What changes were proposed in this PR? Process the filter expression after adding implicit expression Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4063
…which leads to memory leak Why is this PR needed? When there are two spark applications, one drop a table, some cache information of this table stay in another application and cannot be removed with any method like "Drop metacache" command. This leads to memory leak. With the passage of time, memory leak will also accumulate which finally leads to driver OOM. Following are the leak points: 1) tableModifiedTimeStore in CarbonFileMetastore; 2) segmentLockMap in BlockletDataMapIndexStore; 3) absoluteTableIdentifierByteMap in SegmentPropertiesAndSchemaHolder; 4) tableInfoMap in CarbonMetadata. What changes were proposed in this PR? Using expiring map to cache the table information in CarbonMetadata and modified time in CarbonFileMetaStore so that stale information will be cleared automatically after the expiration time. Operations in BlockletDataMapIndexStore no need to be locked, remove all the logic related to segmentLockMap. Does this PR introduce any user interface change? New configuration carbon.metacache.expiration.seconds is added. Is any new testcase added? No This closes #4057
… case of concurrent load, compact and clean files operation Why is this PR needed? There were 2 issues in the clean files post event listener: 1. In concurrent cases, while writing entry back to the table status file, wrong path was given, due to which table status file was not updated in the case of SI table. 2. While writing the loadmetadetails to the table status file during concurrent scenarios, we were only writing the unwanted segments and not all the segments, which could make segments stale in the SI table Due to these 2 issues, when selet query is executed on SI table, the tablestatus would have entry for a segment but it's carbondata file would be deleted, thus throwing an IO Exception. 3. Segment ID is null when writing hive table What changes were proposed in this PR? 1.& 2. Added correct table status path as well sending the correct loadmetadatadetails to be updated in the table status file. Now when select query is fired on the SI table, it will not throw carbondata file not found exception 3. set the load model after setup job of committer Does this PR introduce any user interface change? No Is any new testcase added? No This closes #4066
…table after concurrent Load & Compaction operation Why is this PR needed? When Concurrent LOAD and COMPACTION is in progress on main table having SI, SILoadEventListenerForFailedSegments listener is called to repair SI failed segments if any. It will compare SI and main table segment status, if there is a mismatch, then it will add that specific load to failedLoads to be re-loaded again. During Compaction, SI will be updated first and then maintable. So, in some cases, SI segment will be in compacted state and main table will be in SUCCESS state(the compaction can be still in progress or due to some operation failure). SI index repair will add those segments to failedLoads, by checking if segment lock can be acquired. But, if maintable compaction is finished by the time, SI repair comparison is done, then also, it can acquire segment lock and add those load to failedLoad(even though main table load is COMPACTED). After the concurrent operation is finished, some segments of SI are marked as INSERT_IN_PROGRESS. This will lead to inconsistent state between SI and mainTable segments. What changes were proposed in this PR? Acquire compaction lock on maintable(to ensure compaction is not running), and then compare SI and main table load details, to repair SI segments. Does this PR introduce any user interface change? No Is any new testcase added? No (concurrent scenario) This closes #4067
…e in Presto integration Why is this PR needed? FT for following cases has been added. Here store is created by spark and it is read by Presto. update without local-dict delete operations on table minor, major, custom compaction add and delete segments test update with inverted index read with partition columns Filter on partition columns Bloom index test range columns read streaming data Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4031
…der in SDK Why is this PR needed? Currently, SDK pagination reader is not supported for the filter expression and also returning the wrong result after performing IUD operation through SDK. What changes were proposed in this PR? In case of filter present or update/delete operation get the total rows in splits after building the carbon reader else get the row count from the details info of each splits. Handled ArrayIndexOutOfBoundException and return zero in case of rowCountInSplits.size() == 0 Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4068
Why is this PR needed? 1. Block SI creation on binary column. 2. Block alter table drop column directly on SI table. 3. Create table as like should not be allowed for SI tables. 4. Filter with like should not scan SI table. 5. Currently compaction is allowed on SI table. Because of this if only SI table is compacted and running filter query query on main table is causing more data scan of SI table which will causing performance degradation. What changes were proposed in this PR? 1. Blocked SI creation on binary column. 2. Blocked alter table drop column directly on SI table. 3. Handled Create table as like for SI tables. 4. Handled filter with like to not scan SI table. 5. Block the direct compaction on SI table and add FTs for compaction scenario of SI. 6. Added FT for compression and range column on SI table. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4037
Why is this PR needed? In order to support MERGE INTO SQL Command in Carbondata The previous Scala Parser having trouble to parse the complicated Merge Into SQL Command What changes were proposed in this PR? Add an ANTLR parser, and support parse MERGE INTO SQL Command to DataSet Command Does this PR introduce any user interface change? Yes. The PR introduces the MERGE INTO SQL Command. Is any new testcase added? Yes This closes #4032 Co-authored-by: Zhangshunyu <[email protected]>
Why is this PR needed? since version 2.0, carbon supports starting spark ThriftServer with CarbonExtensions. What changes were proposed in this PR? add the document to start spark ThriftServer with CarbonExtensions. Does this PR introduce any user interface change? No Is any new testcase added? No This closes #4077
entry when there is no update/insert data Why is this PR needed? 1. After #3999 when an update happens on the table, a new segment is created for updated data. But when there is no data to update, still the segments are created and the table status has in progress entries for those empty segments. This leads to unnecessary segment dirs and an increase in table status entries. 2. after this, clean files don't clean these empty segments. 3. when the source table do not have data, CTAS will result in same problem mentioned. What changes were proposed in this PR? when the data is not present during update, make the segment as marked for delete so that the clean files take care to delete the segment, for cats already handled, added test cases. This closes #4018
…ry on sort column giving wrong result with IndexServer Why is this PR needed? 1. Create a table and read from sdk written files fails in cluster with java.nio.file.NoSuchFileException: hdfs:/hacluster/user/hive/warehouse/carbon.store/default/sdk. 2. After fixing the above path issue, filter query on sort column gives the wrong result with IndexServer. What changes were proposed in this PR? 1. In getAllDeleteDeltaFiles , used CarbonFiles.listFiles instead of Files.walk to handle custom file types. 2. In PruneWithFilter , isResolvedOnSegment is used in filterResolver step. Have set table and expression on executor side, so indexserver can use this in filterResolver step. This closes #4064
…hancement Why is this PR needed? Spatial index feature optimization of CarbonData What changes were proposed in this PR? 1. Update spatial index encoded algorithm, which can reduce the required properties of creating geo table 2. Enhance geo query UDFs, support querying geo table with polygon list, polyline list, geoId range list. And add some geo transforming util UDFs. 3. Load data (include LOAD and INSERT INTO) allows user to input spatial index, which column will still generated internally when user does not give. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4012
add override
#117, The longitude is six decimal places and the dimension is five digits. Why is it the same length after conversion?
Co-authored-by: QiangCai <[email protected]>
Co-authored-by: QiangCai <[email protected]>
…a types (#4263) Why is this PR needed? CHAR and VARCHAR as String data types are no longer supported in Carbon. They should be deleted from doc's desc. What changes were proposed in this PR? CHAR and VARCHAR stop appearing as two String data types in doc. Does this PR introduce any user interface change? No Is any new testcase added? No Co-authored-by: tangchuan <[email protected]>
Co-authored-by: QiangCai <[email protected]>
Bumps [pyarrow](https://github.com/apache/arrow) from 0.11.1 to 14.0.1. - [Commits](apache/arrow@apache-arrow-0.11.1...go/v14.0.1) --- updated-dependencies: - dependency-name: pyarrow dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Minor refactor the build docs * Fix review comments * Update build/README.md
Co-authored-by: jacky <[email protected]>
* upgrade thrift version * change to use 0.20.0 --------- Co-authored-by: jacky <[email protected]>
Bumps org.apache.commons:commons-compress from 1.4.1 to 1.26.0. --- updated-dependencies: - dependency-name: org.apache.commons:commons-compress dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
Make sure the PR title is formatted like:
[CARBONDATA-<Jira issue #>] Description of pull request
Make sure tests pass via
mvn clean verify
. (Even better, enableTravis-CI on your fork and ensure the whole test matrix passes).
Replace
<Jira issue #>
in the title with the actual Jira issuenumber, if there is one.
If this contribution is large, please file an Apache
Individual Contributor License Agreement.
Testing done
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.