-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge from apache master #7
base: master
Are you sure you want to change the base?
Commits on Dec 21, 2020
-
[CARBONDATA-4092] Fix concurrent issues in delete segment API's and M…
…V flow Why is this PR needed? They are multiple issues with the Delete segment API: Not using the latest loadmetadatadetails while writing to table status file, thus can remove table status entry of any concurrently loaded Insert In progress/success segment. The code reads the table status file 2 times When in concurrent queries, they both access checkAndReloadSchema for MV on all databases, 2 different queries try to create a file on same location, HDFS takes the lock for one and fails for another, thus failing the query What changes were proposed in this PR? Only reading the table status file once. Using the latest tablestatus to mark the segment Marked for delete, thus no concurrent issues will come Made touchMDT and checkAndReloadSchema methods syncronized, so that only instance can access it at one time. Does this PR introduce any user interface change? No Is any new testcase added? No This closes #4059
Configuration menu - View commit details
-
Copy full SHA for ecebee5 - Browse repository at this point
Copy the full SHA ecebee5View commit details -
[CARBONDATA-4093] Added logs for MV and method to verify if mv is in …
…Sync during query Why is this PR needed? Added logs for MV and method to verify if mv is in Sync during query What changes were proposed in this PR? 1. Move MV Enable Check to beginning to avoid transform logical plan 2. Add Logger if exception is occurred during fetching mv schema 3. Check if MV is in Sync and allow Query rewrite 4. Reuse reading LoadMetadetails to get mergedLoadMapping 5. Set NO-Dict Schema types for insert-partition flow - missed from [CARBONDATA-4077] Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4060
Configuration menu - View commit details
-
Copy full SHA for aae93c1 - Browse repository at this point
Copy the full SHA aae93c1View commit details
Commits on Dec 22, 2020
-
[CARBONDATA-4094]: Fix fallback count(*) issue on partition table wit…
…h index server Why is this PR needed? The used asJava converts to java "in place", without copying the whole data to save time and memory and it just simply wraps the scala collection with a class that conforms to the java interface and thus java serializer is not able to serialize it. What changes were proposed in this PR? Converting it to list, so that it is able to serialize a list. Does this PR introduce any user interface change? No Is any new testcase added? No This closes #4061
Configuration menu - View commit details
-
Copy full SHA for 1dfcdec - Browse repository at this point
Copy the full SHA 1dfcdecView commit details
Commits on Dec 23, 2020
-
[CARBONDATA-4089] Create table with location, if the location doesn't…
… have scheme, the default will be local file system, which is not the file system defined by fs.defaultFS Why is this PR needed? Create table with location, if the location doesn't have scheme, the default will be local file system, which is not the file system defined by fs.defaultFS. What changes were proposed in this PR? If the location doesn't have scheme, add the fs.defaultFS scheme to the beginning of the location. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4065
Configuration menu - View commit details
-
Copy full SHA for c8cec12 - Browse repository at this point
Copy the full SHA c8cec12View commit details -
[CARBONDATA-4095] Fix Select Query with SI filter fails, when columnD…
…rift is Set Why is this PR needed? After converting expression to IN Expression for maintable with SI, expression is not processed if ColumnDrift is enabled. Query fails with NPE during resolveFilter. Exception is added in JIRA What changes were proposed in this PR? Process the filter expression after adding implicit expression Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4063
Configuration menu - View commit details
-
Copy full SHA for 11ae435 - Browse repository at this point
Copy the full SHA 11ae435View commit details -
[CARBONDATA-4088] Drop metacache didn't clear some cache information …
…which leads to memory leak Why is this PR needed? When there are two spark applications, one drop a table, some cache information of this table stay in another application and cannot be removed with any method like "Drop metacache" command. This leads to memory leak. With the passage of time, memory leak will also accumulate which finally leads to driver OOM. Following are the leak points: 1) tableModifiedTimeStore in CarbonFileMetastore; 2) segmentLockMap in BlockletDataMapIndexStore; 3) absoluteTableIdentifierByteMap in SegmentPropertiesAndSchemaHolder; 4) tableInfoMap in CarbonMetadata. What changes were proposed in this PR? Using expiring map to cache the table information in CarbonMetadata and modified time in CarbonFileMetaStore so that stale information will be cleared automatically after the expiration time. Operations in BlockletDataMapIndexStore no need to be locked, remove all the logic related to segmentLockMap. Does this PR introduce any user interface change? New configuration carbon.metacache.expiration.seconds is added. Is any new testcase added? No This closes #4057
Configuration menu - View commit details
-
Copy full SHA for 385d9ab - Browse repository at this point
Copy the full SHA 385d9abView commit details
Commits on Dec 29, 2020
-
[CARBONDATA-4099] Fixed select query on main table with a SI table in…
… case of concurrent load, compact and clean files operation Why is this PR needed? There were 2 issues in the clean files post event listener: 1. In concurrent cases, while writing entry back to the table status file, wrong path was given, due to which table status file was not updated in the case of SI table. 2. While writing the loadmetadetails to the table status file during concurrent scenarios, we were only writing the unwanted segments and not all the segments, which could make segments stale in the SI table Due to these 2 issues, when selet query is executed on SI table, the tablestatus would have entry for a segment but it's carbondata file would be deleted, thus throwing an IO Exception. 3. Segment ID is null when writing hive table What changes were proposed in this PR? 1.& 2. Added correct table status path as well sending the correct loadmetadatadetails to be updated in the table status file. Now when select query is fired on the SI table, it will not throw carbondata file not found exception 3. set the load model after setup job of committer Does this PR introduce any user interface change? No Is any new testcase added? No This closes #4066
Configuration menu - View commit details
-
Copy full SHA for 316939b - Browse repository at this point
Copy the full SHA 316939bView commit details
Commits on Dec 30, 2020
-
[CARBONDATA-4100] Fix SI segments are in inconsistent state with main…
…table after concurrent Load & Compaction operation Why is this PR needed? When Concurrent LOAD and COMPACTION is in progress on main table having SI, SILoadEventListenerForFailedSegments listener is called to repair SI failed segments if any. It will compare SI and main table segment status, if there is a mismatch, then it will add that specific load to failedLoads to be re-loaded again. During Compaction, SI will be updated first and then maintable. So, in some cases, SI segment will be in compacted state and main table will be in SUCCESS state(the compaction can be still in progress or due to some operation failure). SI index repair will add those segments to failedLoads, by checking if segment lock can be acquired. But, if maintable compaction is finished by the time, SI repair comparison is done, then also, it can acquire segment lock and add those load to failedLoad(even though main table load is COMPACTED). After the concurrent operation is finished, some segments of SI are marked as INSERT_IN_PROGRESS. This will lead to inconsistent state between SI and mainTable segments. What changes were proposed in this PR? Acquire compaction lock on maintable(to ensure compaction is not running), and then compare SI and main table load details, to repair SI segments. Does this PR introduce any user interface change? No Is any new testcase added? No (concurrent scenario) This closes #4067
Configuration menu - View commit details
-
Copy full SHA for 19f9027 - Browse repository at this point
Copy the full SHA 19f9027View commit details -
[CARBONDATA-4073] Added FT for missing scenarios and removed dead cod…
…e in Presto integration Why is this PR needed? FT for following cases has been added. Here store is created by spark and it is read by Presto. update without local-dict delete operations on table minor, major, custom compaction add and delete segments test update with inverted index read with partition columns Filter on partition columns Bloom index test range columns read streaming data Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4031
Configuration menu - View commit details
-
Copy full SHA for 8831af4 - Browse repository at this point
Copy the full SHA 8831af4View commit details
Commits on Jan 5, 2021
-
[CARBONDATA-3987] Handled filter and IUD operation for pagination rea…
…der in SDK Why is this PR needed? Currently, SDK pagination reader is not supported for the filter expression and also returning the wrong result after performing IUD operation through SDK. What changes were proposed in this PR? In case of filter present or update/delete operation get the total rows in splits after building the carbon reader else get the row count from the details info of each splits. Handled ArrayIndexOutOfBoundException and return zero in case of rowCountInSplits.size() == 0 Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4068
Configuration menu - View commit details
-
Copy full SHA for 44db434 - Browse repository at this point
Copy the full SHA 44db434View commit details
Commits on Jan 6, 2021
-
[CARBONDATA-4070] [CARBONDATA-4059] Fixed SI issues and improved FT
Why is this PR needed? 1. Block SI creation on binary column. 2. Block alter table drop column directly on SI table. 3. Create table as like should not be allowed for SI tables. 4. Filter with like should not scan SI table. 5. Currently compaction is allowed on SI table. Because of this if only SI table is compacted and running filter query query on main table is causing more data scan of SI table which will causing performance degradation. What changes were proposed in this PR? 1. Blocked SI creation on binary column. 2. Blocked alter table drop column directly on SI table. 3. Handled Create table as like for SI tables. 4. Handled filter with like to not scan SI table. 5. Block the direct compaction on SI table and add FTs for compaction scenario of SI. 6. Added FT for compression and range column on SI table. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4037
Configuration menu - View commit details
-
Copy full SHA for 4d8a01f - Browse repository at this point
Copy the full SHA 4d8a01fView commit details
Commits on Jan 11, 2021
-
[CARBONDATA-4065] Support MERGE INTO SQL Command
Why is this PR needed? In order to support MERGE INTO SQL Command in Carbondata The previous Scala Parser having trouble to parse the complicated Merge Into SQL Command What changes were proposed in this PR? Add an ANTLR parser, and support parse MERGE INTO SQL Command to DataSet Command Does this PR introduce any user interface change? Yes. The PR introduces the MERGE INTO SQL Command. Is any new testcase added? Yes This closes #4032 Co-authored-by: Zhangshunyu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e019806 - Browse repository at this point
Copy the full SHA e019806View commit details
Commits on Jan 19, 2021
-
[DOC] Running the Thrift JDBC/ODBC server with CarbonExtensions
Why is this PR needed? since version 2.0, carbon supports starting spark ThriftServer with CarbonExtensions. What changes were proposed in this PR? add the document to start spark ThriftServer with CarbonExtensions. Does this PR introduce any user interface change? No Is any new testcase added? No This closes #4077
Configuration menu - View commit details
-
Copy full SHA for 2129466 - Browse repository at this point
Copy the full SHA 2129466View commit details
Commits on Jan 21, 2021
-
[CARBONDATA-4055]Fix creation of empty segment directory and meta
entry when there is no update/insert data Why is this PR needed? 1. After #3999 when an update happens on the table, a new segment is created for updated data. But when there is no data to update, still the segments are created and the table status has in progress entries for those empty segments. This leads to unnecessary segment dirs and an increase in table status entries. 2. after this, clean files don't clean these empty segments. 3. when the source table do not have data, CTAS will result in same problem mentioned. What changes were proposed in this PR? when the data is not present during update, make the segment as marked for delete so that the clean files take care to delete the segment, for cats already handled, added test cases. This closes #4018
Configuration menu - View commit details
-
Copy full SHA for aa2121e - Browse repository at this point
Copy the full SHA aa2121eView commit details
Commits on Jan 22, 2021
-
[CARBONDATA-4096] SDK read fails from cluster and sdk read filter que…
…ry on sort column giving wrong result with IndexServer Why is this PR needed? 1. Create a table and read from sdk written files fails in cluster with java.nio.file.NoSuchFileException: hdfs:/hacluster/user/hive/warehouse/carbon.store/default/sdk. 2. After fixing the above path issue, filter query on sort column gives the wrong result with IndexServer. What changes were proposed in this PR? 1. In getAllDeleteDeltaFiles , used CarbonFiles.listFiles instead of Files.walk to handle custom file types. 2. In PruneWithFilter , isResolvedOnSegment is used in filterResolver step. Have set table and expression on executor side, so indexserver can use this in filterResolver step. This closes #4064
Configuration menu - View commit details
-
Copy full SHA for 7585656 - Browse repository at this point
Copy the full SHA 7585656View commit details
Commits on Jan 25, 2021
-
[CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs en…
…hancement Why is this PR needed? Spatial index feature optimization of CarbonData What changes were proposed in this PR? 1. Update spatial index encoded algorithm, which can reduce the required properties of creating geo table 2. Enhance geo query UDFs, support querying geo table with polygon list, polyline list, geoId range list. And add some geo transforming util UDFs. 3. Load data (include LOAD and INSERT INTO) allows user to input spatial index, which column will still generated internally when user does not give. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4012
Configuration menu - View commit details
-
Copy full SHA for 5971417 - Browse repository at this point
Copy the full SHA 5971417View commit details
Commits on Jan 27, 2021
-
[CARBONDATA-4097] ColumnVectors should not be initialized as
ColumnVectorWrapperDirect for alter tables Why is this PR needed? Direct filling of column vectors is not allowed for alter tables, But its column vectors were getting initialized as ColumnVectorWrapperDirect. What changes were proposed in this PR? Changed the initialization of column vectors to ColumnVectorWrapper for alter tables. This closes #4062
Configuration menu - View commit details
-
Copy full SHA for f5e35cd - Browse repository at this point
Copy the full SHA f5e35cdView commit details
Commits on Jan 29, 2021
-
[CARBONDATA-4104] Vector filling for complex decimal type needs to be…
… handled Why is this PR needed? Filling of vectors in case of complex decimal type whose precision is greater than 18 is not handled properly. for ex- array<decimal(20,3)> What changes were proposed in this PR? Ensured proper vector filling considering it's page data type. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4073
Configuration menu - View commit details
-
Copy full SHA for 54f8697 - Browse repository at this point
Copy the full SHA 54f8697View commit details
Commits on Jan 30, 2021
-
[CARBONDATA-4109] Improve carbondata coverage for presto-integration …
…code Why is this PR needed? Few scenarios had missing coverage in presto-integration code. This PR aims to improve it by considering all such scenarios. Dead code- ObjectStreamReader.java was created with an aim to query complex types. Instead ComplexTypeStreamReader was created. Making ObjectStreamreader obsolete. What changes were proposed in this PR? Test cases added for scenarios that were not covered earlier in presto-integration code Removed dead code. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4074
Configuration menu - View commit details
-
Copy full SHA for 46a46a0 - Browse repository at this point
Copy the full SHA 46a46a0View commit details
Commits on Feb 2, 2021
-
[CARBONDATA-4112] Data mismatch issue in SI global sort merge flow
Why is this PR needed? When the data files of a SI segment are merged. it results in having more number of rows in SI table than main table. What changes were proposed in this PR? CARBON_INPUT_SEGMENT property was not set before creating the dataframe from SI segment. So it was creating dataframe from all the rows in the table, not only from a particular segment. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4083
Configuration menu - View commit details
-
Copy full SHA for 5a2edc3 - Browse repository at this point
Copy the full SHA 5a2edc3View commit details -
[CARBONDATA-4113] Partition prune and cache fix when carbon.read.part…
…ition.hive.direct is disabled Why is this PR needed? When carbon.read.partition.hive.direct is false then select queries on partition table result is invalid . For a single partition, partition values are appended to form the wrong path when loaded by the same segment. Ex: For partition on column b, path: /tablepath/b=1/b=2 What changes were proposed in this PR? In PartitionCacheManager, changes made to handle single and multiple partitions. Encoded the URI path to handle space values in the string. This closes #4084
Configuration menu - View commit details
-
Copy full SHA for 440ab03 - Browse repository at this point
Copy the full SHA 440ab03View commit details
Commits on Feb 4, 2021
-
[CARBONDATA-4082] Fix alter table add segment query on adding a segme…
…nt having delete delta files Why is this PR needed? When a segment is added to a carbon table by alter table add segment query and that segment also have a deleteDelta file present in it, then on querying the carbon table the deleted rows are coming in the result. What changes were proposed in this PR? Updating the tableStatus and tableUpdateStatus files in correct way for the segments having delta delta files. This closes #4070
Configuration menu - View commit details
-
Copy full SHA for aa7efda - Browse repository at this point
Copy the full SHA aa7efdaView commit details -
[CARBONDATA-4107] Added related MV tables Map to fact table and added…
… lock while touchMDTFile Why is this PR needed? 1. After MV support multi-tenancy PR, mv system folder is moved to database level. Hence, during each operation, insert/Load/IUD/show mv/query, we are listing all the databases in the system and collecting mv schemas and checking if there is any mv mapped to the table or not. This will degrade performance of the query, to collect mv schemas from all databases, even though the table has mv or not. 2. When different jvm process call touchMDTFile method, file creation and deletion can happen same time. This may fail the operation. What changes were proposed in this PR? 1. Added a table property relatedMVTablesMap to fact tables of MV during MV creation. During any operation, check if the table has MV or not using the added property and if it has, then collect schemas of only related databases. In this way, we can avoid collecting mv schemas for table which dont have MV. 2. Take a Global level lock on system folder location, to update last modified time. NOTE: For compatibilty scenarios, can perform refresh MV operation to update these table properties. Does this PR introduce any user interface change? Yes. For compatibilty scenarios, can perform refresh MV operation to update these table properties. Is any new testcase added? No This closes #4076
Configuration menu - View commit details
-
Copy full SHA for 9b04540 - Browse repository at this point
Copy the full SHA 9b04540View commit details -
[CARBONDATA-4111] Filter query having invalid results after add segme…
…nt to table having SI with Indexserver Why is this PR needed? When the index server is enabled, filter query on SI column after alter table add sdk segment to maintable throws NoSuchMethodException and the rows added by sdk segment are not returned in the result. What changes were proposed in this PR? Added segment path in index server flow, as it is used to identify external segment in filter resolver step. No need to load to SI, if it is an add load command. Default constructor for SegmentWrapperContainer declared. Does this PR introduce any user interface change? No Is any new testcase added? No This closes #4080
Configuration menu - View commit details
-
Copy full SHA for afbf531 - Browse repository at this point
Copy the full SHA afbf531View commit details -
[CARBONDATA-4102] Added UT and FT to improve coverage of SI module.
Why is this PR needed? Added UT and FT to improve coverage of SI module and also removed the dead or unused code. What changes were proposed in this PR? Added UT and FT to improve coverage of SI module and also removed the dead or unused code. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4071
Configuration menu - View commit details
-
Copy full SHA for ec1c0ca - Browse repository at this point
Copy the full SHA ec1c0caView commit details
Commits on Feb 10, 2021
-
[CARBONDATA-4122] Use CarbonFile API instead of java File API for Fli…
…nk CarbonLocalWriter Why is this PR needed? Currently, only two writer's(Local & S3) is supported for flink carbon streaming support. If user wants to ingest data from flink as a carbon format, directly into HDFS carbon table, there is no writer type to support it. What changes were proposed in this PR? Since the code for writing flink stage data will be same for Local and Hdfs FileSystems, we can use the existing CarbonLocalWriter to write data into hdfs, by using CarbonFile API instead of java File API. Changed code to use CarbonFile API instead of java.io.File. Does this PR introduce any user interface change? No Is any new testcase added? No This closes #4090
Configuration menu - View commit details
-
Copy full SHA for 115182d - Browse repository at this point
Copy the full SHA 115182dView commit details
Commits on Feb 17, 2021
-
[CARBONDATA-4125] SI compatability issue fix
Why is this PR needed? Currently, while upgrading table store with SI, we have to execute REFRESH tables and REGISTER INDEX command to refresh and register the index to main table. And also, while SI creation, we add a property 'indexTableExists' to main table, to identify if table has SI or not. If a table has SI, then we load the index information for that table from Hive {org.apache.spark.sql.secondaryindex.hive.CarbonInternalMetastore#refreshIndexInfo}. indexTableExists will be default 'false' to all tables which does not have SI and for SI tables, this property will not be added. {org.apache.spark.sql.secondaryindex.hive.CarbonInternalMetastore#refreshIndexInfo} will be called on any command to refresh indexInfo. indexTableExists property should be either true(Main table) or null (SI), in order to get index information from Hive and set it to carbon table. Issue 1: While upgarding tables with SI, after refresh main table and SI, If user does any operation like Select or Show cache, it is adding indexTableExists property to false. After register index and on doing any operation with SI(load or select), {org.apache.spark.sql.secondaryindex.hive.CarbonInternalMetastore#refreshIndexInfo} is not updating index information to SI table, since indexTableExists is false. Hence, load to SI will fail. Issue 2: While upgarding tables with SI, after refresh main table and SI, If user does any operation like Update, alter, delete to SI table, while registering it as a index, it is not validating the alter operations done on that table. What changes were proposed in this PR? Issue 1: While registering SI table as a index, check if SI table has indexTableExists proeprty and remove it. For already registered index, allow re-register index to remove the property. Issue 2: Added validations for checking if SI has undergone Load/Update/delete/alter opertaion before registering it as a index and throw exception. This closes #4087
Configuration menu - View commit details
-
Copy full SHA for 791857b - Browse repository at this point
Copy the full SHA 791857bView commit details -
[CARBONDATA-4124] Fix Refresh MV which does not exist error message
Why is this PR needed? Refreshing MV which does not exist, is not throwing proper carbon error message. It throws Table NOT found message from Spark. This is because, getSchema is returning null, if schema is not present. What changes were proposed in this PR? 1. Check If getSchema is null and throw No such MV exception. 2. While drop table, drop mv and then drop fact table from metastore, to avoid getting Nullpointer exception, when trying to access fact table while drop MV. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4091
Configuration menu - View commit details
-
Copy full SHA for 91f1b69 - Browse repository at this point
Copy the full SHA 91f1b69View commit details -
[CARBONDATA-4117][CARBONDATA-4123] cg index and bloom index query iss…
…ue with Index server Why is this PR needed? 1. Test cg index query with Index server fails with NPE. While initializing the index model, a parsing error is thrown when trying to uncompress with snappy. 2. Bloom index query with Index server giving incorrect results when splits have >1 blocklets. Blocklet level details are not serialized for index server as it is considered as block level cache. What changes were proposed in this PR? 1. Have set segment and schema details to BlockletIndexInputSplit object. While writing minmax object, write byte size instead of position. 2. Creating BlockletIndex when bloom filter is used, so in createBlocklet step isBlockCache is set to false. This closes #4089
Configuration menu - View commit details
-
Copy full SHA for 3f1db97 - Browse repository at this point
Copy the full SHA 3f1db97View commit details
Commits on Feb 18, 2021
-
[CARBONDATA-3962] Fixed concurrent load failure with flat folder stru…
…cture. Why is this PR needed? PR #3904 has added the code to remove fact directory and because of this concurrent load fails with file not found exception. What changes were proposed in this PR? Reverted PR 3904. This closes #4905
Configuration menu - View commit details
-
Copy full SHA for 1cab165 - Browse repository at this point
Copy the full SHA 1cab165View commit details -
[CARBONDATA-4126] Concurrent compaction failed with load on table
Why is this PR needed? Concurrent compaction was failing when run in parallel with load. During load we acquire SegmentLock for a particular segment, and when this same lock we try to acquire during compaction, we were not able to acquire this lock and compaction fails. What changes were proposed in this PR? Skipped compaction for segments for which we are not able to acquire the SegmentLock instead of throwing the exception. This closes #4093
Configuration menu - View commit details
-
Copy full SHA for 5ec3536 - Browse repository at this point
Copy the full SHA 5ec3536View commit details -
[CARBONDATA-4121] Prepriming is not working in Index Server
Why is this PR needed? Prepriming is not working in Index Server. Server.getRemoteUser returns null value in async call of prepriming which results in NPE and crashes the indexServer application. Issue Induced after PR #3952 What changes were proposed in this PR? Computed the Server.getRemoteUser value before making the async prepriming call and then used the same value during async call. Code reset to code before PR #3952 This closes #4088
Configuration menu - View commit details
-
Copy full SHA for 59ad77a - Browse repository at this point
Copy the full SHA 59ad77aView commit details
Commits on Mar 3, 2021
-
[CARBONDATA-4115] Successful load and insert will return segment ID
Why is this PR needed? Currently successful load and insert sql return empty Seq in carbondata, we need it to return the segment ID. What changes were proposed in this PR? Successful load and insert will return segment ID. Does this PR introduce any user interface change? Yes. (Successful load and insert will return segment ID.) Is any new testcase added? Yes This closes #4086
Configuration menu - View commit details
-
Copy full SHA for 0112268 - Browse repository at this point
Copy the full SHA 0112268View commit details
Commits on Mar 5, 2021
-
[CARBONDATA-4137] Refactor CarbonDataSourceScan without the soruces.F…
…ilter of Spark 3 Why is this PR needed? 1. In spark version 3, org.apache.spark.sql.sources.Filter is sealed, carbon can't extend it in carbon code. 2. The name of CarbonLateDecodeStrategy class is incorrect, the code is complex and hard to read 3. CarbonDataSourceScan can be the same for 2.3 and 2.4, and should support both batch reading and row reading. What changes were proposed in this PR? 1. translate spark Expression to carbon Expression directly, skip the spark Filter step. Remove all spark Filters in carbon code. old follow: Spark Expression => Spark Filter => Carbon Expression new follow: Spark Expression => Carbon Expression 2. Remove filter reorder, need to implement expression reorder (added CARBONDATA-4138). 3. separate CarbonLateDecodeStrategy to CarbonSourceStrategy and DMLStrategy, and simplify the code of CarbonSourceStrategy. 4. move CarbonDataSourceScan back to the source folder, use one CarbonDataSourceScan for all versions CarbonDataSourceScan supports both VectorReader and RowReader, Carbon will not use RowDataSourceScanExec. Does this PR introduce any user interface change? No Is any new testcase added? No
Configuration menu - View commit details
-
Copy full SHA for 8f2ee7f - Browse repository at this point
Copy the full SHA 8f2ee7fView commit details
Commits on Mar 9, 2021
-
[CARBONDATA-4133] Concurrent Insert Overwrite with static partition o…
…n Index server fails Why is this PR needed? Concurrent Insert Overwrite with static partition on Index server fails. When index server and prepriming are enabled, prepriming is triggered even when load fails as it is in finally block. Performance degradation with indexserver due to #4080 What changes were proposed in this PR? Removed triggerPrepriming method from finally. Reverted 4080 and used a boolean flag to determine the external segment. Does this PR introduce any user interface change? No Is any new testcase added? No, tested in cluster. This closes #4096
Configuration menu - View commit details
-
Copy full SHA for 35c4b33 - Browse repository at this point
Copy the full SHA 35c4b33View commit details
Commits on Mar 10, 2021
-
[CARBONDATA-4141] Index Server is not caching indexes for external ta…
…bles with sdk segments Why is this PR needed? Indexes cached in Executor cache are not dropped when drop table is called for external table with SDK segments. Because, external tables with sdk segments will not have metadata like table status file. So in drop table command we send zero segments to indexServer clearIndexes job, which clears nothing from executor side. So when we drop this type of table, executor side indexes are not dropped. Now when we again create external table with same location and do select * or select count(*), it will not cache the indexes for this table, because indexes with same loaction are already present. Now show metacache on this newly created table will use new tableId , but indexes present have the old tableId, whose table is already dropped. So show metacache will return nothing, because of tableId mismatch. What changes were proposed in this PR? Prepared the validSegments from indexFiles present at external table location and send it to IndexServer clearIndexes job through IndexInputFormat. This closes #4099
Configuration menu - View commit details
-
Copy full SHA for 25c5687 - Browse repository at this point
Copy the full SHA 25c5687View commit details
Commits on Mar 12, 2021
-
[CARBONDATA-4075] Using withEvents instead of fireEvent
Why is this PR needed? withEvents method can simplify code to fire event What changes were proposed in this PR? Refactor code to use the withEvents method instead of fireEvent This closes #4078
Configuration menu - View commit details
-
Copy full SHA for d5b3b8c - Browse repository at this point
Copy the full SHA d5b3b8cView commit details
Commits on Mar 15, 2021
-
[CARBONDATA-4110] Support clean files dry run operation and show stat…
…istics after clean files operation Why is this PR needed? Currently in the clean files operation the user does not know how much space will be freed. The idea is the add support for dry run in clean files which can tell the user how much space will be freed in the clean files operation without cleaning the actual data. What changes were proposed in this PR? This PR has the following changes: 1. Support dry run in clean files: It will show the user how much space will be freed by the clean files operation and how much space left (which can be released after expiration time) after the clean files operation. 2. Clean files output: Total size released during the clean files operation 3. Disable clean files Statistics option in case the user does not want clean files statistics 4. Clean files log: To enhance the clean files log to print the name of every file that is being deleted in the info log. This closes #4072
Configuration menu - View commit details
-
Copy full SHA for d9f69ae - Browse repository at this point
Copy the full SHA d9f69aeView commit details
Commits on Mar 19, 2021
-
[CARBONDATA-4144] During compaction, the segment lock of SI table is …
…not released in abnormal scenarios. Why is this PR needed? When compact operation fails, the segment lock of SI table is not released. Run compaction again, can not get the segment lock of the SI table and compation does nothing, but in the tablestatus file of SI table the merged segment status is set to success and the segmentfile is xxx_null.segments and the vaule of indexsize is 0. What changes were proposed in this PR? If an exception occurs, release the obtained segment locks. If getting segment locks failed, not update the segment status. Does this PR introduce any user interface change? No Is any new testcase added? No This closes #4102
Configuration menu - View commit details
-
Copy full SHA for bce6481 - Browse repository at this point
Copy the full SHA bce6481View commit details -
[CARBONDATA-4145] Query fails and the message "File does not exist: x…
…xxx.carbondata" is displayed Why is this PR needed? If an exception occurs when the refresh index command is executed, a task has been successful. The new query will be failed. Reason: After the compaction task is executed successfully, the old carbondata files are deleted. If other exception occurs, the deleted files are missing. This PR will fix this issue. What changes were proposed in this PR? When all tasks are successful, the driver deletes the old carbondata files. Does this PR introduce any user interface change? No Is any new testcase added? No This closes #4103
Configuration menu - View commit details
-
Copy full SHA for a4921e9 - Browse repository at this point
Copy the full SHA a4921e9View commit details -
[CARBONDATA-4149] Fix query issues after alter add partition.
Why is this PR needed? Query with SI after add partition based on location on partition table gives incorrect results. 1. While pruning, if it's an external segment, it should use ExternalSegmentResolver , and no need to use ImplicitIncludeFilterExecutor as an external segment is not added in the SI table. 2. If the partition table has external partitions, after compaction the new files are loaded to the external path. 3. Data is not loaded to the child table(MV) after executing add partition command What changes were proposed in this PR? 1. add path to loadMetadataDetails for external partition. It is used to identify it as an external segment. 2. After compaction, to not maintain any link to the external partition, the compacted files will be added as a new partition in the table. To update partition spec details in hive metastore, (drop partition + add partition) operations performed. 3. Add Load Pre and Post listener's in CarbonAlterTableAddHivePartitionCommand to trigger data load to materialized view. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4107
Configuration menu - View commit details
-
Copy full SHA for b00efca - Browse repository at this point
Copy the full SHA b00efcaView commit details -
[CARBONDATA-4148] Reindex failed when SI has stale carbonindexmerge file
Why is this PR needed? Reindex failed when SI has stale carbonindexmerge file, throw exception FileNotFoundException. This is because SegmentFileStore.getIndexFiles stores the mapping of indexfile to indexmergefile, when stale carbon indexmergefile exists, indexmergefile will not be null. During merging index file, new indexmergefile will be created with same name as before in the same location. At the end of CarbonIndexFileMergeWriter.writeMergeIndexFileBasedOnSegmentFile, carbon index file will be deleted. Since indexmergefile is stored in the indexFiles list, newly created indexmergefile will be delete also, which leads to FileNotFoundException. What changes were proposed in this PR? 1. SegmentFileStore.getIndexFiles stores the mapping of indexfile to indexmergefile which is redundant. 2. SegmentFileStore.getIndexOrMergeFiles returns both index file and index merge file, so function name is incorrect, rename to getIndexAndMergeFiles. 3. CarbonLoaderUtil.getActiveExecutor actually get active node, so function name is incorrect, rename to getActiveNode, together replace all "executor" with "node" in function assignBlocksByDataLocality. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4105
Configuration menu - View commit details
-
Copy full SHA for b74645e - Browse repository at this point
Copy the full SHA b74645eView commit details
Commits on Mar 21, 2021
-
[CARBONDATA-4147] Fix re-arrange schema in logical relation on MV par…
…tition table having sort column Why is this PR needed? After PR-3615, we have avoided rearranging catalog table schema if already re-arranged. For MV on a partition table, we always move the partition column to the end on a MV partition table. Catalog table will also have the column schema in same order(partition column at last). Hence, in this case, we do not re-arrange logical relation in a catalog table again. But, if there is a sort column present in MV table, then selected column schema and catalog table schema will not be in same order. In that case, we have to re-arrange the catalog table schema. Currently, we are using rearrangedIndex to re-arrange the catalog table logical relation, but rearrangedIndex will keep the index of partition column at the end, whereas, catalog table has partition column already at the end. Hence, we are re-arranging the partition column index again in catalog table relation, which leads to insertion failure. Example: Create MV on columns: c1, c2 (partition), c3(sort_column), c4 Problem: Create order: c1,c2,c3,c4 Create order index: 0,1,2,3 Rearranged Index: Existing Catalog table schema order: c1, c3, c4, c2 (for MV, partition column will be moved to Last) Rearrange index: 2,0,3,1 After Re-arrange catalog table order: c4,c2,c2, c3(which is wrong) Solution: Change MV create order as below New Create order: c1,c4,c3,c2 Create order index: 0,1,2,3 Rearranged Index: Existing Catalog table schema order: c1, c3, c4, c2 (for MV, partition column will be moved to Last) Rearrange index: 1,0,2,3 After Re-arrange catalog table order: c3,c1,c4,c2 What changes were proposed in this PR? In MV case, if there is any column schema order change apart from partition column, then re-arrange index of only those columns and use the same to re-arrange catalog table logical relation. This closes #4106
Configuration menu - View commit details
-
Copy full SHA for 8d17de6 - Browse repository at this point
Copy the full SHA 8d17de6View commit details
Commits on Mar 23, 2021
-
[CARBONDATA-4146]Query fails and the error message "unable to get fil…
…e status" is displayed. query is normal after the "drop metacache on table" command is executed. Why is this PR needed? During compact execution, the status of the new segment is set to success before index files are merged. After index files are merged, the carbonindex files are deleted. As a result, the query task cannot find the cached carbonindex files. What changes were proposed in this PR? Set the status of the new segment to succeeded after index files are merged. Does this PR introduce any user interface change? No Is any new testcase added? No This closes #4104
Configuration menu - View commit details
-
Copy full SHA for 6ab3647 - Browse repository at this point
Copy the full SHA 6ab3647View commit details -
[CARBONDATA-4153] Fix DoNot Push down not equal to filter with Cast o…
…n SI Why is this PR needed? NOT EQUAL TO filter on SI index column, should not be pushed down to SI table. Currently, where x!='2' is not pushing down to SI, but where x!=2 is pushed down to SI. This is because "x != 2" will be wrapped in a CAST expression like NOT EQUAL TO(cast(x as int) = 2). What changes were proposed in this PR? Handle CAST case while checking DONOT PUSH DOWN to SI This closes #4108
Configuration menu - View commit details
-
Copy full SHA for fd0ff22 - Browse repository at this point
Copy the full SHA fd0ff22View commit details -
[CARBONDATA-4155] Fix Create table like table with MV
Why is this PR needed? PR-4076 has added a new table property to fact table. While executing create table like command, this property is not excluded, which leads to parsing exception. What changes were proposed in this PR? Remove MV related info from destination table properties This closes #4111
Configuration menu - View commit details
-
Copy full SHA for 0f53bdb - Browse repository at this point
Copy the full SHA 0f53bdbView commit details -
[CARBONDATA-4149] Fix query issues after alter add empty partition lo…
…cation Why is this PR needed? Query with SI after add partition based on empty location on partition table gives incorrect results. pr- 4107 fixes the issue for add partition if the location is not empty. What changes were proposed in this PR? while creating blockid, get segment number from the file name for the external partition. This blockid will be added to SI and used for pruning. To identify as an external partition during the compaction process, instead of checking with loadmetapath, checking with filepath.startswith(tablepath) format. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4112
Configuration menu - View commit details
-
Copy full SHA for f5e4c89 - Browse repository at this point
Copy the full SHA f5e4c89View commit details
Commits on Mar 25, 2021
-
[CARBONDATA-4156] Fix Writing Segment Min max with all blocks of a se…
…gment Why is this PR needed? PR-3999 has removed some code related to getting segment min max from all blocks. Because of this, if segment has more than one block, currently, it is writing min max considering one block only. What changes were proposed in this PR? Reverted specific code from above PR. Removed unwanted synchronization for some methods This closes #4101
Configuration menu - View commit details
-
Copy full SHA for 865ec9b - Browse repository at this point
Copy the full SHA 865ec9bView commit details
Commits on Mar 26, 2021
-
[CARBONDATA-4154] Fix various concurrent issues with clean files
Why is this PR needed? There are 2 issues in clean files operation when ran concurrently with multiple load operations: Dry run can show negative space freed for clean files with concurrent load. Accidental deletion of Insert in progress(ongoing load) during clean files operation. What changes were proposed in this PR? To solve the dry run negative result, saving the old metadatadetails before the clean files operation and comparing it with loadmetadetails after the clean files operation and just ignoring any new entry that has been added, basically doing an intersection of new and old metadatadetails to show the correct space freed. In case of load failure issue, there can be scenarios where load in going on(insert in progress state and segment lock is occupied) and as during clean files operation when the final table status lock is removed, there can be scenarios where the load has completed and the segment lock is released but in the clean files in the final list of loadmetadatadetails to be deleted, that load can still be in Insert In Progress state with segment lock released by the load. The clean files operation will delete such loads. To solve this issue, instead of sending a boolean which check if update is required or not in the tablestatus, can send a list of load numbers and will only delete those loadnumbers. Does this PR introduce any user interface change? No Is any new testcase added? No This closes #4109
Configuration menu - View commit details
-
Copy full SHA for d535a1e - Browse repository at this point
Copy the full SHA d535a1eView commit details
Commits on Mar 28, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 4ec3e58 - Browse repository at this point
Copy the full SHA 4ec3e58View commit details
Commits on Mar 29, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 603133f - Browse repository at this point
Copy the full SHA 603133fView commit details -
Configuration menu - View commit details
-
Copy full SHA for baa1f69 - Browse repository at this point
Copy the full SHA baa1f69View commit details
Commits on Apr 15, 2021
-
Configuration menu - View commit details
-
Copy full SHA for db8666c - Browse repository at this point
Copy the full SHA db8666cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6be1691 - Browse repository at this point
Copy the full SHA 6be1691View commit details
Commits on Apr 19, 2021
-
[CARBONDATA-4161] Describe complex columns
Why is this PR needed? Currently describe formatted displays the column information of a table and some additional information. When complex types such as ARRAY, STRUCT, and MAP types are present in the table, column definition can be long and it’s difficult to read in a nested format. What changes were proposed in this PR? The DESCRIBE output can be formatted to avoid long lines for multiple fields. We can pass the column name to the command and visualize its structure with child fields. Does this PR introduce any user interface change? Yes , DDL Commands: DESCRIBE COLUMN fieldname ON [db_name.]table_name; DESCRIBE short [db_name.]table_name; Is any new testcase added? Yes This closes #4113
Configuration menu - View commit details
-
Copy full SHA for f67c8fa - Browse repository at this point
Copy the full SHA f67c8faView commit details
Commits on Apr 20, 2021
-
[CARBONDATA-4163] Support adding of single-level complex columns(arra…
…y/struct) Why is this PR needed? This PR enables adding of single-level complex columns(only array and struct) to carbon table. Command - ALTER TABLE <table_name> ADD COLUMNS(arr1 ARRAY (double) ) ALTER TABLE <table_name> ADD COLUMNS(struct1 STRUCT<a:int, b:string>) The default value for the column in case of old rows will be null. What changes were proposed in this PR? 1. Create instances of ColumnSchema for each of the children, By doing this each child column will have its own ordinal. The new columns are first identified and stored in a flat structure. For example, for arr1 array(int) --> 2 column schemas are created - arr1 and arr1.val. First being the parent and second being its child. Each of which will have its own ordinals. 2. Later while updating the Schema evolution entry we only account for the newly added parent columns while discarding children columns (As they are no longer required. Otherwise we will have the child as a separate column in the schema ). 3. Using the schema evolution entry the final schema is updated. Since ColumnSchemas are stored as a flat structure we later convert them to a nested structure of type Dimensions. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4115
Configuration menu - View commit details
-
Copy full SHA for d01d9f5 - Browse repository at this point
Copy the full SHA d01d9f5View commit details
Commits on Apr 22, 2021
-
[CARBONDATA-4158]Add Secondary Index as a coarse-grain index and use …
…secondary indexes for Presto queries Why is this PR needed? At present, secondary indexes are leveraged for query pruning via spark plan modification. This approach is tightly coupled with spark because the plan modification is specific to spark engine. In order to use secondary indexes for Presto or Hive queries, it is not feasible to modify the query plans as we desire in the current approach. Thus need arises for an engine agnostic approach to use secondary indexes in query pruning. What changes were proposed in this PR? 1. Add Secondary Index as a coarse grain index. 2. Add a new insegment() UDF to support query within the particular segments 3. Control the use of Secondary Index as a coarse grain index pruning with property('carbon.coarse.grain.secondary.index') 4. Use Index Server driver for Secondary Index pruning 5. Use Secondary Indexes with Presto Queries This closes #4110
Configuration menu - View commit details
-
Copy full SHA for 09ad509 - Browse repository at this point
Copy the full SHA 09ad509View commit details -
[CARBONDATA-4037] Improve the table status and segment file writing
Why is this PR needed? Currently, we update table status and segment files multiple times for a single iud/merge/compact operation and delete the index files immediately after merge. When concurrent queries are run, there may be situations like user query is trying to access the segment index files and they are not present, which is availability issue. What changes were proposed in this PR? 1. Generate segment file after merge index and update table status at beginning and after merge index. If mergeindex/ table status update fails , load will also fail. order: create table status file => index files => merge index => generate segment file => update table status * Same order is now maintained for SI, compaction, IUD, addHivePartition, addSegment scenarios. * Whenever segment file needs to be updated for main table, a new segment file is created instead of updating existing one. 2. When compact 'segment_index' is triggered, For new tables - if no index files to merge, then logs warn message and exits. For old tables - index files not deleted. 3. After SI small files merge, For newly loaded SI segments - DeleteOldIndexOrMergeFiles deletes immediately after merge. For segments that are already present (rebuild) - old index files and data files are not deleted. 4. Removed carbon.merge.index.in.segment property from config-parameters. This property to be used for debugging/test purposes. Note: Cleaning of stale index/segment files to be handled in - CARBONDATA -4074 This closes #3988
Configuration menu - View commit details
-
Copy full SHA for 71910fb - Browse repository at this point
Copy the full SHA 71910fbView commit details
Commits on Apr 26, 2021
-
[CARBONDATA-4173][CARBONDATA-4174] Fix inverted index query issue and…
… handle exception for desc column Why is this PR needed? After creating an Inverted index on the dimension column, some of the filter queries give incorrect results. handle exception for higher level non-existing children column in desc column. What changes were proposed in this PR? While sorting byte arrays with inverted index, we use compareTo method of ByteArrayColumnWithRowId. Here, it was sorting based on the last byte only. Made changes to sort properly based on the entire byte length when dictionary is used. handled exception and added in testcase. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4124
Configuration menu - View commit details
-
Copy full SHA for 3a6e4a4 - Browse repository at this point
Copy the full SHA 3a6e4a4View commit details -
[CARBONDATA-4172] Select query having parent and child struct column …
…in projection returns incorrect results Why is this PR needed? After PR-3574, a scenario has been missed while code refactor. Currently, if select query has both Parent and its child struct column in projection, only child column is pushed down to carbon for filling result. For other columns in parent Struct, data output is null. What changes were proposed in this PR? If parent struct column is also present in projection, then push down only parent column to carbon. This closes #4123
Configuration menu - View commit details
-
Copy full SHA for 3b411bb - Browse repository at this point
Copy the full SHA 3b411bbView commit details
Commits on Apr 27, 2021
-
[CARBONDATA-4167][CARBONDATA-4168] Fix case sensitive issues and inpu…
…t validation for Geo values. Why is this PR needed? 1. SPATIAL_INDEX property, POLYGON, LINESTRING, and RANGELIST UDF's are case sensitive. 2. SPATIAL_INDEX.xx.gridSize and SPATIAL_INDEX.xxx.conversionRatio is accepting negative values. 3. Accepting invalid values in geo UDF's. What changes were proposed in this PR? 1. converted properties to lower case and made UDF's case insensitive. 2. added validation. 3. refactored readAllIIndexOfSegment Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4118
Configuration menu - View commit details
-
Copy full SHA for e5b1dd0 - Browse repository at this point
Copy the full SHA e5b1dd0View commit details -
[CARBONDATA-4170] Support dropping of parent complex columns(array/st…
…ruct/map) Why is this PR needed? This PR supports dropping of parent complex columns (single and multi-level) from the carbon table. Dropping of parent column will in turn drop all of its children columns too. What changes were proposed in this PR? Children columns are prefixed with its parent column name. So the identified columns are added to the delete-column-list and the schema is updated based on that.Test cases have been written up to 3-levels. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4121
Configuration menu - View commit details
-
Copy full SHA for 2f93479 - Browse repository at this point
Copy the full SHA 2f93479View commit details
Commits on Apr 30, 2021
-
[HOTFIX] Remove hitcount link due to not working
Why is this PR needed? hitcount link in readme md file is not working What changes were proposed in this PR? Remove the hitcount link as its not required. Does this PR introduce any user interface change? No Is any new testcase added? No This closes #4128
Configuration menu - View commit details
-
Copy full SHA for 7350c33 - Browse repository at this point
Copy the full SHA 7350c33View commit details
Commits on May 10, 2021
-
[CARBONDATA-4166] Geo spatial Query Enhancements
Why is this PR needed? Currently, for IN_POLYGON_LIST and IN_POLYLINE_LIST udf’s, polygons need to be specified in SQL. If the polygon list grows in size, then the SQL will also be too long, which may affect query performance, as SQL analysing cost will be more. If Polygons are defined as a Column in a new dimension table, then, Spatial dimension table join can be supported in order to support aggregation on spatial table columns based on polygons. What changes were proposed in this PR? Support IN_POLYGON_LIST and IN_POLYLINE_LIST with SELECT QUERY on the polygon table. Support IN_POLYGON filter as join condition for spatial JOIN queries. Does this PR introduce any user interface change? Yes. Is any new testcase added? Yes This closes #4127
Configuration menu - View commit details
-
Copy full SHA for c825730 - Browse repository at this point
Copy the full SHA c825730View commit details
Commits on May 11, 2021
-
[CARBONDATA-4175] [CARBONDATA-4162] Leverage Secondary Index till seg…
…ment level Why is this PR needed? In the existing architecture, if the parent(main) table and SI table don’t have the same valid segments then we disable the SI table. And then from the next query onwards, we scan and prune only the parent table until we trigger the next load or REINDEX command (as these commands will make the parent and SI table segments in sync). Because of this, queries take more time to give the result when SI is disabled. What changes were proposed in this PR? Instead of disabling the SI table(when parent and child table segments are not in sync) we will do pruning on SI tables for all the valid segments(segments with status success, marked for update and load partial success) and the rest of the segments will be pruned by the parent table. As of now, query on the SI table can be pruned in two ways: a) With SI as data map. b) WIth spark plan rewrite. This PR contains changes to support both methods of SI to leverage till segment level. This closes #4116
Configuration menu - View commit details
-
Copy full SHA for 8996369 - Browse repository at this point
Copy the full SHA 8996369View commit details
Commits on May 20, 2021
-
[CARBONDATA-4188] Fixed select query with small table page size after…
… alter add column Why is this PR needed? Select query on table with long string data type and small page size throws ArrayIndexOutOfBoudException after alter add columns. Query fails because after changing the schema, the number of rows set in bitsetGroup(RestructureIncludeFilterExecutorImpl.applyFilter()) for pages is not correct. What changes were proposed in this PR? Set the correct number of rows inside every page of bitsetGroup. This closes #4137
Configuration menu - View commit details
-
Copy full SHA for 41a756f - Browse repository at this point
Copy the full SHA 41a756fView commit details -
[CARBONDATA-4185] Doc Changes for Heterogeneous format segments in ca…
…rbondata Why is this PR needed? Heterogeneous format segments in carbondata documenation. What changes were proposed in this PR? Add segment feature background and impact on existed carbondata features This closes #4134
Configuration menu - View commit details
-
Copy full SHA for 861ba2e - Browse repository at this point
Copy the full SHA 861ba2eView commit details
Commits on May 24, 2021
-
[CARBONDATA-4184] alter table Set TBLPROPERTIES for RANGE_COLUMN sets…
… unsupported datatype(complex_datatypes/Binary/Boolean/Decimal) as RANGE_COLUMN Why is this PR needed? Alter table set command was not validating unsupported dataTypes for range column. What changes were proposed in this PR? Added validation for unsupported dataTypes before setting range column value. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4133
Configuration menu - View commit details
-
Copy full SHA for 35091a2 - Browse repository at this point
Copy the full SHA 35091a2View commit details -
[CARBONDATA-4189] alter table validation issues
Why is this PR needed? 1. Alter table duplicate columns check for dimensions/complex columns missed 2. Alter table properties with long strings for complex columns should not support What changes were proposed in this PR? 1. Changed the dimension columns list type in preparing dimensions columns [LinkedHashSet to Scala Seq] for handling the duplicate columns 2. Added check for throwing an exception in case of long strings for complex columns Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4138
Configuration menu - View commit details
-
Copy full SHA for 07c98e8 - Browse repository at this point
Copy the full SHA 07c98e8View commit details
Commits on May 25, 2021
-
[CARBONDATA-4183] Local sort Partition Load and Compaction fix
Why is this PR needed? Currently, number of tasks for partition table local sort load, is decided based on input file size. In this case, the data will not be properly sorted, as tasks launched is more. For compaction, number of tasks is equal to number of partitions. If data is huge for a partition, then there can be chances, that compaction will fail with OOM with less memory configurations. What changes were proposed in this PR? When local sort task level property is enabled, For local sort load, divide input files based on the node locality (num of task = num of nodes), which will properly do the local sorting. For compaction, launch task based on task id for a partition, so the task launched for a partition will be more. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4130
Configuration menu - View commit details
-
Copy full SHA for a90243c - Browse repository at this point
Copy the full SHA a90243cView commit details
Commits on Jun 2, 2021
-
[CARBONDATA-4186] Fixed insert failure when partition column present …
…in local sort scope Why is this PR needed? Currently when we create table with partition column and put the same column as part of local sort scope then Insert query fails with ArrayIndexOutOfBounds exception. What changes were proposed in this PR? Handle ArrayIndexOutOfBound exception, earlier array size was not increasing because data was inconsistence and in the wrong order for sortcolumn and isDimNoDictFlags. This closes #4132
Configuration menu - View commit details
-
Copy full SHA for 01fd120 - Browse repository at this point
Copy the full SHA 01fd120View commit details -
[CARBONDATA-4191] update table for primitive column not working when …
…complex child column name and primitive column name match Why is this PR needed? Update primitive column not working when complex column child name and primitive data type name same. When an update for primitive is received, we are checking in complex child columns if column name matches then returning UnsupportedOperationbException. What changes were proposed in this PR? Currently, we are ignoring the prefix of all columns and passing only columns/child column info to the update command. New Changes: Passing full column(alias name/table name.columnName) name which is given by the user and added checks for handling the unsupported update operation of complex columns. This closes #4139
Configuration menu - View commit details
-
Copy full SHA for 4c04f7c - Browse repository at this point
Copy the full SHA 4c04f7cView commit details
Commits on Jun 4, 2021
-
[Doc] syntax and format issues in README.md and how-to-contribute-to-…
…apache-carbondata.md Why is this PR needed? To improve the quality of README.md and how-to-contribute-to-apache-carbondata.md. What changes were proposed in this PR? Syntax and format changes. This closes #4136
Configuration menu - View commit details
-
Copy full SHA for 26e9182 - Browse repository at this point
Copy the full SHA 26e9182View commit details -
[CARBONDATA-4192] UT cases correction for validating the exception me…
…ssage correctly Why is this PR needed? Currently, when we check the exception message like below, it is not asserting/failing/ catching if the message content is different. `intercept[UnsupportedOperationException]( sql("update test set(a)=(4) where id=1").collect()).getMessage.contains("abc")` What changes were proposed in this PR? 1. Added assert condition like below for validating the exception message correctly `assert(intercept[UnsupportedOperationException]( sql("update test set(a)=(4) where id=1").collect()).getMessage.contains("abc"))` 2. Added assert condition to check exception message for some test cases which are not checking exception message 3. Fixed add segment doc heading related issues This closes #4140
Configuration menu - View commit details
-
Copy full SHA for 8740016 - Browse repository at this point
Copy the full SHA 8740016View commit details
Commits on Jun 7, 2021
-
[CARBONDATA-4193] Fix compaction failure after alter add complex column.
Why is this PR needed? 1. When we perform compaction after alter add a complex column, the query fails with ArrayIndexOutOfBounds exception. While converting and adding row after merge step in WriteStepRowUtil.fromMergerRow, As complex dimension is present, the complexKeys array is accessed but doesnt have any values in array and throws exception. 2. Creating SI with globalsort on newly added complex column throws TreenodeException (Caused by: java.lang.RuntimeException: Couldn't find positionId#172 in [arr2#153]) What changes were proposed in this PR? 1. While restructuring row, added changes to fill complexKeys with default values(null values to children) according to the latest schema. In SI queryresultprocessor, used the column property isParentColumnComplex to identify any complex type. If complex index column not present in the parent table block, assigned the SI row value to empty bytes. 2. For SI with globalsort, In case of complex type projection, TableProperties object in carbonEnv is not same as in carbonTable object and hence requiredColumns is not updated with positionId. So updating tableproperties from carbon env itself. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4142
Configuration menu - View commit details
-
Copy full SHA for fee8b18 - Browse repository at this point
Copy the full SHA fee8b18View commit details -
[CARBONDATA-4196] Allow zero or more white space in GEO UDFs
Why is this PR needed? Currently, regex of geo UDF is not allowing zero space between UDF name and parenthesis. It always expects a single space in between. For ex: linestring (120.184179 30.327465). Because of this sometimes using the UDFs without space is not giving the expected result. What changes were proposed in this PR? Allow zero space between UDFs and parenthesis. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4145
Configuration menu - View commit details
-
Copy full SHA for 70643df - Browse repository at this point
Copy the full SHA 70643dfView commit details -
[CARBONDATA-4143] Enable UT with index server and fix related issues
Why is this PR needed? enable to run UT with index server. Fix below issues: 1. With index server enabled, select query gives incorrect result with SI when parent and child table segments are not in sync. 2. When reindex is triggered, if stale files are present in the segment directory the segment file is being written with incorrect file names. (both valid index and stale mergeindex file names). As a result, duplicate data is present in SI table but there are no error/incorrect query results. What changes were proposed in this PR? usage of flag useIndexServer. excluded some of the test cases to not run with index server. 1. While pruning from index server, missingSISegments values were not getting considered. Have passed down and set those values to filter. 2. Before loading data to SI segment, added changes to delete the segment directory if already present. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4098
Configuration menu - View commit details
-
Copy full SHA for d838e3b - Browse repository at this point
Copy the full SHA d838e3bView commit details
Commits on Jun 10, 2021
-
[CARBONDATA-4179] Support renaming of complex columns (array/struct)
Why is this PR needed? This PR enables renaming of complex columns - parent as well as children columns with nested levels example: if the schema contains columns - str1 struct<a:int, b:string>, arr1 array<long> 1. alter table <table_name> change str1 str2 struct<a:int, b:string> 2. alter table <table_name> change arr1 arr2 array<long> 3. Changing parent name as well as child name 4. alter table <table_name> change str1 str2 struct<abc:int, b:string> NOTE- Rename operation fails if the structure of the complex column has been altered. This check ensures the old and new columns are compatible with each other. Meaning the number of children and complex levels should be unaltered while attempting to rename. What changes were proposed in this PR? 1. Parses the incoming new complex type. Create a nested DatatypeInfo structure. 2. This DatatypeInfo is then passed on to the AlterTableDataTypeChangeModel. 3. Validation for compatibility, duplicate columns happens here. 4. Add the parent column to the schema evolution entry. 5. Update the spark catalog table. Limitation - Renaming is not supported for Map types yet Does this PR introduce any user interface change? Yes Is any new testcase added? Yes This closes #4129
Configuration menu - View commit details
-
Copy full SHA for cfa02dd - Browse repository at this point
Copy the full SHA cfa02ddView commit details -
[CARBONDATA-4202] Fix issue when refresh main table with MV
Why is this PR needed? When trying to register a table of old store which has MV, it fails parser error(syntax issue while creating table). It is trying to create table with relatedmvtablesmap property which is not valid. What changes were proposed in this PR? 1. Removed relatedmvtablesmap from table properties in RefreshCarbonTableCommand 2. After Main table has registered, to register MV made changes to get the schema from the system folder and register. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4147
Configuration menu - View commit details
-
Copy full SHA for 90841bc - Browse repository at this point
Copy the full SHA 90841bcView commit details
Commits on Jun 16, 2021
-
[CARBONDATA-4206] Support rename SI table
Why is this PR needed? Currently rename SI table can succeed, but after rename, insert and query on main table failed, throw no such table exception. This is because after SI table renamed, main table's tblproperties didn't get update, it still stores the old SI table name, when refering to SI table, it tries to find the SI table by old name, which leads to no such table exception. What changes were proposed in this PR? After SI table renamed, update the main table's tblproperties with new SI information. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4149
Configuration menu - View commit details
-
Copy full SHA for f1da9e8 - Browse repository at this point
Copy the full SHA f1da9e8View commit details
Commits on Jun 18, 2021
-
[CARBONDATA-4208] Wrong Exception received for complex child long str…
…ing columns Why is this PR needed? When we create a table with complex columns with child columns with long string data type then receiving column not found in table exception. Normally it should throw an exception in the above case by saying that complex child columns will not support long string data type. What changes were proposed in this PR? Added a case if complex child column has long string data type then throw correct exception. Exception: MalformedCarbonCommandException Exception Message: Complex child column cannot be set as LONG_STRING_COLUMNS Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4150
Configuration menu - View commit details
-
Copy full SHA for 65fad98 - Browse repository at this point
Copy the full SHA 65fad98View commit details -
[CARBONDATA-4212] Fix case sensitive issue with Update query having A…
…lias Table name Why is this PR needed? Update Query having Alias Table name, fails with Unsupported complex types error, even if table does not any. What changes were proposed in this PR? Check the columnName irrespective of case Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4152
Configuration menu - View commit details
-
Copy full SHA for 95ab745 - Browse repository at this point
Copy the full SHA 95ab745View commit details -
[CARBONDATA-4213] Fix update/delete issue in index server
Why is this PR needed? During update/delete, the segment file in the segment would come as an empty string due to which it was not able to read the segment file. What changes were proposed in this PR? 1. Changed the empty string to NULL 2. Added empty segment file condition while creating SegmentFileStore. Does this PR introduce any user interface change? No Is any new testcase added? No This closes #4153
Configuration menu - View commit details
-
Copy full SHA for fdd00ab - Browse repository at this point
Copy the full SHA fdd00abView commit details
Commits on Jun 19, 2021
-
[CARBONDATA-4211] Fix - from xx Insert into select fails if an SQL st…
…atement contains multiple inserts Why is this PR needed? When multiple inserts with single query is used, it fails from SparkPlan with: java.lang.ClassCastException: GenericInternalRow cannot be cast to UnsafeRow. For every successful insert/load we return Segment ID as a row. For multiple inserts also, we are returning a row containing Segment ID but while processing in spark ClassCastException is thrown. What changes were proposed in this PR? When multiple insert query is given, it has Union node in the plan. Based on its presence, made changes to use flag isMultipleInserts to call class UnionCommandExec and implemented custom sideEffectResult which converts GenericInternalRow to UnsafeRow and return. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4151
Configuration menu - View commit details
-
Copy full SHA for d8f7df9 - Browse repository at this point
Copy the full SHA d8f7df9View commit details
Commits on Jun 22, 2021
-
[CARBONDATA-4217] Fix rename SI table, other applications didn't get …
…reflected issue Why is this PR needed? After one application rename SI table, other application cannot be reflected of this change, which leads to query on SI column failed. What changes were proposed in this PR? After update index info of parent table, persist schema info so that other applications can refresh table metadata in time. This closes #4155
Configuration menu - View commit details
-
Copy full SHA for d5cb011 - Browse repository at this point
Copy the full SHA d5cb011View commit details
Commits on Jun 23, 2021
-
[CARBONDATA-4214] inserting NULL value when timestamp value received …
…from FROM_UNIXTIME(0) Why is this PR needed? Filling null in case of timestamp value is received from FROM_UNIXTIME(0) as spark original insert rdd value[internalRow] received in this case zero. if the original column value[internalRow] is zero then in insert flow adding NULL and giving NULL to spark. When query happens on the same column received NULL value instead of timestamp value. Problem code: if (internalRow.getLong(index) == 0) { internalRow.setNullAt(index) } What changes were proposed in this PR? Removed the null filling check for zero value case and if internalRow value is non null/empty then only set the internalRow timestamp value. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4154
Configuration menu - View commit details
-
Copy full SHA for 18665cc - Browse repository at this point
Copy the full SHA 18665ccView commit details -
[CARBONDATA-4190] Integrate Carbondata with Spark 3.1.1 version
Why is this PR needed? To integrate Carbondata with Spark3.1.1 What changes were proposed in this PR? Refactored code to add changes to support Spark 3.1.1 along with Spark 2.3 and 2.4 versions Changes: 1. Compile Related Changes 1. New Spark package in MV, Streaming and spark-integration. 2. API wise changes as per spark changes 2. Spark has moved to Proleptic Gregorian Calendar, due to which timestamp related changes in carbondata are also required. 3. Show segment by select command refactor 4. Few Lucene test cases ignored due to the deadlock in spark DAGSchedular, which does not allow it to work. 5. Alter rename: Parser enabled in Carbon and check for carbon 6. doExecuteColumnar() changes in CarbonDataSourceScan.scala 7. char/varchar changes from spark side. 8. Rule name changed in MV 9. In univocity parser, CSVParser version changed. 10. New Configs added in SparkTestQueryExecutor to keep some behaviour same as 2.3 and 2.4 Does this PR introduce any user interface change? No Is any new testcase added? No This closes #4141
Configuration menu - View commit details
-
Copy full SHA for 8ceb4fd - Browse repository at this point
Copy the full SHA 8ceb4fdView commit details -
[CARBONDATA-4225] Fix Update performance issues when auto merge compa…
…ction is enabled Why is this PR needed? 1. When auto-compaction is enabled, during update, we are trying to do compaction after Insert. Auto-Compaction throws exception, after multiple retries. Carbon does not allow concurrent compaction and Update. 2. dataframe.rdd.isEmpty will launch a Job. This code is called two times in code, which is not reused. What changes were proposed in this PR? 1. Avoid trying to do Auto-compaction during Update. 2. Reuse dataframe.rdd.isEmpty and avoided launching a Job. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4156
Configuration menu - View commit details
-
Copy full SHA for d4ddd07 - Browse repository at this point
Copy the full SHA d4ddd07View commit details
Commits on Jun 26, 2021
-
[HOTFIX] Correct CI build status
Due to apache jenkis CI address changed, correct CI build status.
Configuration menu - View commit details
-
Copy full SHA for 899b7ae - Browse repository at this point
Copy the full SHA 899b7aeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5e2adad - Browse repository at this point
Copy the full SHA 5e2adadView commit details
Commits on Jun 29, 2021
-
[CARBONDATA-4230] table properties not updated with lower-case and table
comment is not working in carbon spark3.1 Why is this PR needed? 1. table properties storing with case-sensitive and when we query table properties with the small case then property not able to get hence table create command is failed. this is induced with spark 3.1 integration changes. 2. Table comment is displayed as byte code in spark 3.1 cluster. CommentSpecContext is changed in 3.1 What changes were proposed in this PR? 1. convert to small case and store in table properties. 2. Get string value from commentSpec and set as table comment Does this PR introduce any user interface change? No Is any new testcase added? No, already test case is present but not failed in local ut setup as create flow is different in local ut env and real cluster setup This closes #4163
Configuration menu - View commit details
-
Copy full SHA for 65462ff - Browse repository at this point
Copy the full SHA 65462ffView commit details
Commits on Jul 5, 2021
-
Configuration menu - View commit details
-
Copy full SHA for aefa977 - Browse repository at this point
Copy the full SHA aefa977View commit details -
Configuration menu - View commit details
-
Copy full SHA for 718490e - Browse repository at this point
Copy the full SHA 718490eView commit details -
[HOTFIX]Revert wrong pom changes commit during prepare release process.
Why is this PR needed? Due to wrong branch release, wrong pom changes are present. What changes were proposed in this PR? revert the pom changes. This closes #4167
Configuration menu - View commit details
-
Copy full SHA for c7a3d6d - Browse repository at this point
Copy the full SHA c7a3d6dView commit details
Commits on Jul 7, 2021
-
[CARBONDATA-4232] Add missing doc change for secondary index.
Why is this PR needed? Documentation changes were not handled in PR 4116 What changes were proposed in this PR? Added missing documentation. This closes #4164
Configuration menu - View commit details
-
Copy full SHA for 88fdf60 - Browse repository at this point
Copy the full SHA 88fdf60View commit details
Commits on Jul 14, 2021
-
[CARBONDATA-4210] Handle 3.1 parsing failures related to alter comple…
…x types Why is this PR needed? For 2.3 and 2.4 parsing of alter commands are done by spark. Which is not in the case of 3.1. What changes were proposed in this PR? So carbon is responsible for the parsing here. Previously ignored test cases due to this issue are now enabled. This closes #4162
Configuration menu - View commit details
-
Copy full SHA for 02e7723 - Browse repository at this point
Copy the full SHA 02e7723View commit details
Commits on Jul 27, 2021
-
[CARBONDATA-4204][CARBONDATA-4231] Fix add segment error message,
index server failed testcases and dataload fail error on update Why is this PR needed? 1. When the path is empty in Carbon add segments then StringIndexOutOfBoundsException is thrown. 2. Index server UT failures fix. 3. Update fails with dataload fail error if set bad records action is specified to force with spark 3.1v. What changes were proposed in this PR? 1. Added check to see if the path is empty and then throw a valid error message. 2. Used checkAnswer instead of assert in test cases so that the order of rows returned would be same with or without index server. Excluded 2 test cases where explain with query statistics is used, as we are not setting any pruning info from index server. 3. On update command, dataframe.persist is called and with latest 3.1 spark changes, spark returns a cloned SparkSession from cacheManager with all specified configurations disabled. As now it's using a different sparkSession for 3.1 which is not initialized in CarbonEnv. So CarbonEnv.init is called where new CarbonSessionInfo is created with no sessionParams. So, the properties set were not accessible. When a new carbonSessionInfo object is getting created, made changes to set existing sessionparams from currentThreadSessionInfo. This closes #4157
Configuration menu - View commit details
-
Copy full SHA for c9a5231 - Browse repository at this point
Copy the full SHA c9a5231View commit details -
[CARBONDATA-4250] Ignoring presto random test cases
Why is this PR needed? Presto test cases failing randomly and taking more time in CI verification for other PRs. What changes were proposed in this PR? Currently presto random test cases will be ignored and will be fixed with other JIRA raised. 1. JIRA [CARBONDATA-4250] raised for ignoring presto test cases currently as this random failures causing PR CI failures. 2. JIRA [CARBONDATA-4249] raised for fixing presto random tests in concurrent scenario. We can get more details on this JIRA about issue reproduce and problem snippet. 3. [CARBONDATA-4254] raised to fix Test alter add for structs enabling local dictionary and CarbonIndexFileMergeTestCaseWithSI.Verify command of index merge This closes #4176
Configuration menu - View commit details
-
Copy full SHA for 0337c32 - Browse repository at this point
Copy the full SHA 0337c32View commit details
Commits on Jul 28, 2021
-
[CARBONDATA-4251][CARBONDATA-4253] Optimize Clean Files Performance
Why is this PR needed? 1) When execute cleanfile command, it cleans up all the carbonindex and carbonmergeindex that once existed, even though carbonindex files have been merged into carbonergeindex and deleted. When there are tens of thousands of carbonindex that once existed after the completion of the compaction, the clean file command will take serveral hours to clean index files which actually doesn't exist. We just need to clean up the existing files, carbonmergeindex or carbonindex files 2) The rename command will list partitions of the table, but the partitions information is not actually used. If the table has hundreds of thousands partitions, the performance of rename table will degrade a lot What changes were proposed in this PR? 1) There is a variable indexOrMergeFiles, which means all existing indexfiles, CLEAN FILE commmand will delete all existing files instead of delete all files in 'indexFilesMap', which is actually all '.carbonindex' files once exists. Clean 'indexOrMergeFiles' helps to improve CLEAN FILES performance a lot. 2) The rename command will list partitions for the table, but the partitions information is not actually used. If the table has hundreds of thousands partitions, the performance of rename table will degrade a lot This closes #4183
Configuration menu - View commit details
-
Copy full SHA for 9aaeba5 - Browse repository at this point
Copy the full SHA 9aaeba5View commit details
Commits on Jul 29, 2021
-
[CARBONDATA-4248] Fixed upper case column name in explain command
Why is this PR needed? Explain command with upper case column name fails with key not found exception. What changes were proposed in this PR? Changed column name to lower case before conversion of spark data type to carbon data type. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4175
Configuration menu - View commit details
-
Copy full SHA for f2698fe - Browse repository at this point
Copy the full SHA f2698feView commit details -
[CARBONDATA-4247][CARBONDATA-4241] Fix Wrong timestamp value query re…
…sults for data before 1900 years with Spark 3.1 Why is this PR needed? 1. Spark 3.1, will store timestamp value as julian micros and rebase timestamp value from JulianToGregorianMicros during query. -> Since carbon parse and formats timestamp value with SimpleDateFormatter, query gives incorrect results, when rebased with JulianToGregorianMicros by spark. 2. CARBONDATA-4241 -> Global sort load and compaction fails on table having timestamp column What changes were proposed in this PR? 1. Use Java Instant to parse new timestamp values. For old stores and query with Spark 3.1, Rebase the timestamp value from Julian to Gregorian Micros 2. If timestamp value is of type Instant, then convert value to java timestamp. Does this PR introduce any user interface change? No Is any new testcase added? No (Existing testcase is sufficient) This closes #4177
Configuration menu - View commit details
-
Copy full SHA for feb0521 - Browse repository at this point
Copy the full SHA feb0521View commit details -
[CARBONDATA-4242]Improve cdc performance and introduce new APIs for U…
…PSERT, DELETE, INSERT and UPDATE Why is this PR needed? 1. In the exiting solution, when we perform join of the source and target dataset for tagging records to delete, update and insert records, we were scanning all the data of target table and then perform join with source dataset. But it can happen that the source data is less and its range may cover only some 100s of Carbondata files out of 1000s of files in the target table. So pruning is main bottleneck here, so scanning all records and involving in join results in so much of shuffle and reduces performance. 2. Source data caching was not there, caching source data will help to improve its multiple scans and since input source data will be of less size, we can persist the dataset. 3. When we were performing join, we used to first get the Row object and then operate on it and then for each datatype casting happens to convert to spark datatype and then covert to InternalRow object for further processing of joined data. This will add extra deserializeToObject and map nodes in DAG and increase time. 4. Initially during tagging records(Join operation), we were preparing a new projection of required columns, which basically involves operations of preparing an internal row object as explained in point 3, and then apply eval function on each row to prepare a projection, so this basically applying same eval of expression on joined data, a repeated work and increases time. 5. In join operation we were using all the columns of source dataset and the required columns of target table like, join key column and other columns of tupleID, status_on_mergeds etc. So when we there will be so many columns in the table, then it will increase the execution time due to lot of data shuffling. 6. The current APIs of merge are little bit complex and generalized and confusing to user for simple Upsert, delete and insert operations. What changes were proposed in this PR? 1. Add a pruning logic before the join operations. Compare the incoming row with an interval based tree data structure which contains the Carbondata file path and min and max to identify the Carbondata file where the incoming row can be present, so that in some use case scenario which will be explained in later section, can give benefit and help to scan less files rather than blindly scanning all the Carbondata files in the target table. 2. Cache the incoming source dataset srcDS.cache(), so that the cached data will be used in all the operations and speed will be improved. Uncache() after the merge operation 3. Instead of operating on row object and then converting to InternalRow, directly operate on the InternalRow object to avoid the data type conversions. 4. Instead of evaluating the expression again based on required project columns on matching conditions and making new projection, directly identify the indexes required for output row and then directly access these indices on the incoming internal row object after step3, so evaluation is avoided and array access with indices will give O(1) performance. 5. During join or the tagging of records, do not include all the column data, just include the join key columns and identify the tupleIDs to delete and the rows to insert, this will avoid lot of shuffle and improve performance significantly. 6. Introduce new APIs for UPSERT, UPDATE, DELETE and INSERT and make the user exposed APIs simple. So now user just needs to give the key column for join, source dataset and the operation type as mentioned above. These new APIs will make use of all the improvements mentioned above and avoid unnecessary operations of the existing merge APIs. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4148
Configuration menu - View commit details
-
Copy full SHA for 1e2fc4c - Browse repository at this point
Copy the full SHA 1e2fc4cView commit details
Commits on Jul 30, 2021
-
[CARBONDATA-4255] Prohibit Create/Drop Database when databaselocation…
… is inconsistent Why is this PR needed? When carbon.storelocation and spark.sql.warehouse.dir are configured to different values, the databaselocation maybe inconsistent. When DROP DATABASE command is executed, maybe both location (carbon dblcation and hive dblocation) will be cleared, which may confuses the users What changes were proposed in this PR? Drop database is prohibited when database locaton is inconsistent Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4186
Configuration menu - View commit details
-
Copy full SHA for 3c81c7a - Browse repository at this point
Copy the full SHA 3c81c7aView commit details
Commits on Aug 1, 2021
-
Configuration menu - View commit details
-
Copy full SHA for aceaa44 - Browse repository at this point
Copy the full SHA aceaa44View commit details
Commits on Aug 3, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 62354e3 - Browse repository at this point
Copy the full SHA 62354e3View commit details
Commits on Aug 5, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 7b0d8f6 - Browse repository at this point
Copy the full SHA 7b0d8f6View commit details
Commits on Aug 7, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 11dab76 - Browse repository at this point
Copy the full SHA 11dab76View commit details -
[CARBONDATA-4268][Doc][summer-2021] Add new dev mailing list (website…
…) link and update the Nabble address This closes #4195
Configuration menu - View commit details
-
Copy full SHA for a5bb652 - Browse repository at this point
Copy the full SHA a5bb652View commit details -
[Doc][summer-2021] Add TOC and format how-to-contribute-to-apache-car…
…bondata.md Due to Github Flavored Markdown, Github does't support TOC automatic generation in Markdown file. Use anchors to implement TOC of headings.
Configuration menu - View commit details
-
Copy full SHA for e8f8c02 - Browse repository at this point
Copy the full SHA e8f8c02View commit details -
[CARBONDATA-4266][Doc][summer-2021] Add TOC and format how-to-contrib…
…ute-to-apache-carbondata.md This closes #4192
Configuration menu - View commit details
-
Copy full SHA for d4abe76 - Browse repository at this point
Copy the full SHA d4abe76View commit details
Commits on Aug 8, 2021
-
Modify minor errors and correct some misunderstandings in the document Create quick-start-guide.md
Configuration menu - View commit details
-
Copy full SHA for 926b67b - Browse repository at this point
Copy the full SHA 926b67bView commit details -
[CARBONDATA-4267][Doc][summer-2021]Update and modify some content in …
…quick-start-guide.md This closes #4197
Configuration menu - View commit details
-
Copy full SHA for fac48be - Browse repository at this point
Copy the full SHA fac48beView commit details
Commits on Aug 11, 2021
-
[CARBONDATA-4256] Fixed parsing failure on SI creation for complex co…
…lumn Why is this PR needed? Currently, SI creation on a complex column that includes child column with a dot(.) fails with parse exception. What changes were proposed in this PR? Handled parsing for create index on complex column. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4187
Configuration menu - View commit details
-
Copy full SHA for bdd4a8c - Browse repository at this point
Copy the full SHA bdd4a8cView commit details -
[CARBONDATA-4091] support prestosql 333 integartion with carbon
Why is this PR needed? Currently carbondata is integrated with presto-sql 316, which is 1.5 years older. There are many good features and optimization that came into presto like dynamic filtering, Rubix data cache and some performance improvements. It is always good to use latest version, latest version is presto-sql 348. But jumping from 316 to 348 will be too many changes. So, to utilize these new features and based on customer demand, I suggest to upgrade presto-sql to 333 version. Later it will be again upgraded to more latest version in few months. Note: This is a plain integration to support all existing features of presto316, deep integration to support new features like dynamic filtering, Rubix cache will be handled in another PR. What changes were proposed in this PR? 1. Adapt to the new hive adapter changes like some constructor changes, Made a carbonDataConnector to support CarbonDataHandleResolver 2. Java 11 removed ConstructorAccessor class, so using unsafe class for reflection. (presto333 depend on java 11 for runtime) 3. POM changes to support presto333 Note: JAVA 11 environment is needed for running presto333 with carbon and also need add this jvm property "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" This closes #4034
Configuration menu - View commit details
-
Copy full SHA for 1ccf295 - Browse repository at this point
Copy the full SHA 1ccf295View commit details
Commits on Aug 16, 2021
-
[CARBONDATA-4269] Update url and description for new prestosql-guide.md
Why is this PR needed? PrestoSQL has now changed its name to Trino. Because Facebook established the Presto Foundation at The Linux Foundation®,Led to prestosql Must be change the name More information can see here : https://trino.io/blog/2020/12/27/announcing-trino.html What changes were proposed in this PR? 1. Change the url to prestosql 333 2. Added a description indicating that the user prestoSQL has been renamed to Trino Does this PR introduce any user interface change? No Is any new testcase added? No This closes #4202
Configuration menu - View commit details
-
Copy full SHA for 5804060 - Browse repository at this point
Copy the full SHA 5804060View commit details
Commits on Aug 19, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 0e59ddb - Browse repository at this point
Copy the full SHA 0e59ddbView commit details
Commits on Aug 22, 2021
-
[CARBONDATA-4272]carbondata test case not including the load command …
…with overwrite This closes #4207
Configuration menu - View commit details
-
Copy full SHA for 9f9ea1f - Browse repository at this point
Copy the full SHA 9f9ea1fView commit details
Commits on Aug 24, 2021
-
[CARBONDATA-4119][CARBONDATA-4238][CARBONDATA-4237][CARBONDATA-4236] …
…Support geo insert without geoId and document changes Why is this PR needed? 1. To insert without geoid (like load) on geo table. 2. [CARBONDATA-4119] : User Input for GeoID column not validated. 3. [CARBONDATA-4238] : Documentation Issue in ddl-of-carbondata.md#add-columns 4. [CARBONDATA-4237] : Documentation issues in streaming-guide.md, file-structure-of-carbondata.md and sdk-guide.md. 5. [CARBONDATA-4236] : Documenatation issues in configuration-parameters.md. 6. import processing class in streaming-guide.md is wrong What changes were proposed in this PR? 1. Made changes to support insert on geo table with auto-generated geoId. 2. [CARBONDATA-4119] : Added documentation about insert with custom geoId. Changes in docs/spatial-index-guide.md 3. Other documentation changes added. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4205
Configuration menu - View commit details
-
Copy full SHA for 8de65a2 - Browse repository at this point
Copy the full SHA 8de65a2View commit details
Commits on Aug 26, 2021
-
[CARBONDATA-4164][CARBONDATA-4198][CARBONDATA-4199][CARBONDATA-4234] …
…Support alter add map, multilevel complex columns and rename/change datatype. Why is this PR needed? Support alter add map, multilevel complex columns, and Change datatype for complex type. What changes were proposed in this PR? 1. Support adding of single-level and multi-level map columns 2. Support adding of multi-level complex columns(array/struct) 3. Support renaming of map columns including nested levels 4. Alter change datatype at nested levels (array/map/struct) Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4180
Configuration menu - View commit details
-
Copy full SHA for f52aa20 - Browse repository at this point
Copy the full SHA f52aa20View commit details
Commits on Aug 31, 2021
-
[CARBONDATA-4274] Fix create partition table error with spark 3.1
Why is this PR needed? With spark 3.1, we can create a partition table by giving partition columns from schema. Like below example: create table partitionTable(c1 int, c2 int, v1 string, v2 string) stored as carbondata partitioned by (v2,c2) When the table is created by SparkSession with CarbonExtension, catalog table is created with the specified partitions. But in cluster/ with carbon session, when we create partition table with above syntax it is creating normal table with no partitions. What changes were proposed in this PR? partitionByStructFields is empty when we directly give partition column names. So it was not creating a partition table. Made changes to identify the partition column names and get the struct field and datatype info from table columns. This closes #4208
Configuration menu - View commit details
-
Copy full SHA for ca659b5 - Browse repository at this point
Copy the full SHA ca659b5View commit details
Commits on Sep 1, 2021
-
[CARBONDATA-4271] Support DPP for carbon
Why is this PR needed? This PR enables Dynamic Partition Pruning for carbon. What changes were proposed in this PR? CarbonDatasourceHadoopRelation has to extend HadoopFsRelation, because spark has added a check to use DPP only for relation matching HadoopFsRelation Apply Dynamic filter and get runtimePartitions and set this to CarbonScanRDD for pruning This closes #4199
Configuration menu - View commit details
-
Copy full SHA for bdc9484 - Browse repository at this point
Copy the full SHA bdc9484View commit details -
[CARBONDATA-4273] Fix Cannot create external table with partitions
Why is this PR needed? Create partition table with location fails with unsupported message. What changes were proposed in this PR? This scenario works in cluster mode. This check can be moved in local mode also and partition table can be created with table with location Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4211
Configuration menu - View commit details
-
Copy full SHA for 42f6982 - Browse repository at this point
Copy the full SHA 42f6982View commit details -
[CARBONDATA-4278] Avoid refetching all indexes to get segment properties
Why is this PR needed? When block index[BlockIndex] is available then no need to prepare indexes[List[BlockIndex] from available segments and partition locations which might cause delay in query performance. What changes were proposed in this PR? Call directly get segment properties if block index[BlockIndex] available. if (segmentIndices.get(0) instanceof BlockIndex) { segmentProperties = segmentPropertiesFetcher.getSegmentPropertiesFromIndex(segmentIndices.get(0)); } else { segmentProperties = segmentPropertiesFetcher.getSegmentProperties(segment, partitionLocations); } getSegmentPropertiesFromIndex is calling directly block index segment properties. Does this PR introduce any user interface change? No Is any new testcase added? No. Already index related test cases are present which can cover the added code. This closes #4209
Configuration menu - View commit details
-
Copy full SHA for 226228f - Browse repository at this point
Copy the full SHA 226228fView commit details
Commits on Sep 8, 2021
-
[CARBONDATA-4282] Fix issues with table having complex columns relate…
…d to long string, SI, local dictionary Why is this PR needed? 1.Insert/load fails after alter add complex column if table contains long string columns. 2.create index on array of complex column (map/struct) throws null pointer exception instead of correct error message. 3.alter table property local dictionary inlcude/exclude with newly added map column is failing. What changes were proposed in this PR? 1. The datatypes array and data row are of different order leading to ClassCastException. Made changes to add newly added complex columns after the long string columns and other dimensions in carbonTableSchemaCommon.scala 2. For complex columns, SI creation on only array of primitive types is allowed. Check if the child column is of complex type and throw an exception. Changes made in SICreationCommand.scala 3. In AlterTableUtil.scala, while validating local dictionary columns, array and struct type are present but map type is missed. Added check for complex types. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4214
Configuration menu - View commit details
-
Copy full SHA for 4d8bc9e - Browse repository at this point
Copy the full SHA 4d8bc9eView commit details
Commits on Sep 16, 2021
-
[CARBONDATA-4277] geo instance compatability fix
Why is this PR needed? The CustomIndex interface extends Serializable and for different version store, if the serialization id doesn't match, it throws java.io.InvalidClassException during load/update/query operations. What changes were proposed in this PR? As the instance is stored in table properties, made changes to initialize and update instance while refresh table. Also added static serialId for the CustomIndex interface. Does this PR introduce any user interface change? No Is any new testcase added? No, tested in cluster This closes #4216
Configuration menu - View commit details
-
Copy full SHA for 7199357 - Browse repository at this point
Copy the full SHA 7199357View commit details -
[CARBONDATA-4284] Load/insert after alter add column on partition tab…
…le with complex column fails Why is this PR needed? Insert after alter add column on partition table with complex column fails with bufferUnderFlowException List of columns order in TableSchema is different after alter add column. Ex: If partition is of dimension type, when table is created the schema columns order is as dimension columns(partition column also) + complex column After alter add, we are changing the order of columns in schema by moving the partition column to last. complex column + partition column Due to this change in order, while fillDimensionAndMeasureDetails, the indexing is wrong as it expects complex column to be last always which causes bufferUnderFlowException while flattening complex row. What changes were proposed in this PR? After alter add, removed changes to add partition column at last. This closes #4215
Configuration menu - View commit details
-
Copy full SHA for 3b29bcb - Browse repository at this point
Copy the full SHA 3b29bcbView commit details -
[CARBONDATA-4286] Fixed measure comparator
Why is this PR needed? Select query on a table with and filter condition returns an empty result while valid data present in the table. Root cause: Currently when we are building the min-max index at block level, that time we are using unsafe byte comparator for either dimension or measure column which returns incorrect result for measure columns. What changes were proposed in this PR? We should use different comparators for dimensions and measure columns which we are already doing at time of writing the min-max index at blocklet level. Does this PR introduce any user interface change? No Is any new testcase added? No This closes #4217
Configuration menu - View commit details
-
Copy full SHA for 2d1907b - Browse repository at this point
Copy the full SHA 2d1907bView commit details
Commits on Sep 20, 2021
-
[CARBONDATA-4285] Fix alter add complex columns with global sort comp…
…action failure Why is this PR needed? Alter add complex columns with global sort compaction is failing due to AOI exception : Currently creating default complex delimiter list in global sort compaction with size of 3. For map case need extra complex delimiter for handling the key-value bad record handling: When we add complex columns after insert the data, complex columns has null data for previously loaded segments. this null value is going to treat as bad record and compaction is failed. What changes were proposed in this PR? In Global sort compaction flow create default complex delimiter with 4, as already doing in load flow. Bad records handling pruned for compaction case. No need to check bad records for compaction as they are already checked while loading. previously loaded segments data we are inserting again in compaction case This closes #4218
Configuration menu - View commit details
-
Copy full SHA for 22342f8 - Browse repository at this point
Copy the full SHA 22342f8View commit details -
[CARBONDATA-4288][CARBONDATA-4289] Fix various issues with Index Serv…
…er caching mechanism. Why is this PR needed? There are 2 issues in the Index Server flow: In case when there is a main table with a SI table with prepriming disabled and index serve enabled, new load to main table and SI table put the cache for the main table in the index server. Cache is also getting again when a select query is fired. This issue happens because during load to SI table, getSplits is called on the main table segment which is in Insert In Progress state. Index server considers this segment as a legacy segment because it's index size = 0 and does not put it's entry in the tableToExecutor mapping. In the getsplits method isRefreshneeded is false the first time getSplits is called. During the select query, in getSplits method isRefreshNeeded is true and the previous loaded entry is removed from the driver but since there is no entry for that table in tableToExecutor mapping, the previous cache value becomes dead cache and always stays in the index server. The newly loaded cache is loaded to a new executor and 2 copies of cache for the same segment is being mantained. Concurrent select queries to the index server shows wrong cache values in the Index server. What changes were proposed in this PR? The following changes are proposed to the index server code: Removing cache object from the index server in case the segment is INSERT IN PROGRESS and in the case of legacy segment adding the value in tabeToExecutor mappping so that the cache is also removed from the executor side. Concurrent queries were able adding duplicate cache values to other executors. Changed logic of assign executors method so that concurrent queries are not able to add cache for same segment in other executors This closes #4219
Configuration menu - View commit details
-
Copy full SHA for ce860d0 - Browse repository at this point
Copy the full SHA ce860d0View commit details
Commits on Oct 7, 2021
-
[CARBONDATA-4243] Fixed si with column meta cache on same column
Why is this PR needed? Currently, the select query fails when table contains SI and column_meta_cache on the same columns with to date() UDF. This is happening because pushdownfilters is null in CarbonDataSourceScanHelper and it is causing null pointer exception. What changes were proposed in this PR? At place of passing null value for pushdownfilters in CarbonDataSourceScan.doCanonicalize passed Seq.empty. This closes #4225
Configuration menu - View commit details
-
Copy full SHA for 9944936 - Browse repository at this point
Copy the full SHA 9944936View commit details -
[CARBONDATA-4228] [CARBONDATA-4203] Fixed update/delete after alter a…
…dd segment Why is this PR needed? Deleted records are reappearing or updated records are showing old values in select queries. It is because after horizontal compaction delete delta file for the external segment is written to the default path which is Fact\part0\segment_x\ while if the segment is an external segment then delete delta file should be written to the path where the segment is present. What changes were proposed in this PR? After delete/update operation on the segment, horizontal compaction will be triggered. Now after horizontal compaction for external segments, the delete delta file will be written to the segment path at the place of the default path. This closes #4220
Configuration menu - View commit details
-
Copy full SHA for bca62cd - Browse repository at this point
Copy the full SHA bca62cdView commit details -
[CARBONDATA-4293] Make Table created without external keyword as Tran…
…sactional table Why is this PR needed? Currently, when you create a table with location( without external keyword) in cluster, the corresponding table is created as transactional table. If External keyword is present, then it is created as non-transactional table. This scenario is not handled in local mode. What changes were proposed in this PR? Made changes, to check if external keyword is present or not. If present, then make the corresponding table as transactional table. This closes #4221
Configuration menu - View commit details
-
Copy full SHA for 5a710f9 - Browse repository at this point
Copy the full SHA 5a710f9View commit details
Commits on Oct 8, 2021
-
[CARBONDATA-4215] Fix query issue after add segment other formats wit…
…h vector read disabled Why is this PR needed? If carbon.enable.vector.reader is disabled and parquet/orc segments are added to carbon table. Then on query, it fails with java.lang.ClassCastException: org.apache.spark.sql.vectorized.ColumnarBatch cannot be cast to org.apache.spark.sql.catalyst.InternalRow. When vector reader property is disabled, while scanning ColumnarBatchScan supportBatch would be overridden to false but external file format like ParuetFileFormat supportBatch is not overriden and it takes default as true. What changes were proposed in this PR? Made changes to override supportBatch of external file formats based on carbon.enable.vector.reader property. This closes #4226
Configuration menu - View commit details
-
Copy full SHA for 8b3d78b - Browse repository at this point
Copy the full SHA 8b3d78bView commit details
Commits on Oct 12, 2021
-
[CARBONDATA-4292] Spatial index creation using spark dataframe
Why is this PR needed? To support spatial index creation using spark data frame What changes were proposed in this PR? Added spatial properties in carbonOptions and edited existing testcases. Does this PR introduce any user interface change? Yes Is any new testcase added? Yes This closes #4222
Configuration menu - View commit details
-
Copy full SHA for b8d9a97 - Browse repository at this point
Copy the full SHA b8d9a97View commit details
Commits on Oct 21, 2021
-
[CARBONDATA-4298][CARBONDATA-4281] Empty bad record support for compl…
…ex type Why is this PR needed? 1. IS_EMPTY_DATA_BAD_RECORD property not supported for complex types. 2. To update documentation that COLUMN_META_CACHE and RANGE_COLUMN doesn't support complex datatype What changes were proposed in this PR? 1. Made changes to pass down IS_EMPTY_DATA_BAD_RECORD property and throw exception. Store empty complex type instead of storing null value which matches with hive table result. 2. Updated document and added testcase. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4228
Configuration menu - View commit details
-
Copy full SHA for 305851e - Browse repository at this point
Copy the full SHA 305851eView commit details
Commits on Oct 23, 2021
-
[CARBONDATA-4306] Fix Query Performance issue for Spark 3.1
Why is this PR needed? Currently, with Spark 3.1, some rules are applied many times resulting in performance degrade. What changes were proposed in this PR? Changed Rules apply strategy from Fixed to Once and CarbonOptimizer can directly extend SparkOptimizer avoiding applying same rules many times This Closes #4229
Configuration menu - View commit details
-
Copy full SHA for 8953cde - Browse repository at this point
Copy the full SHA 8953cdeView commit details
Commits on Oct 26, 2021
-
[CARBONDATA-4303] Columns mismatch when insert into table with static…
… partition Why is this PR needed? When insert into table with static partition, source projects should not contain static partition column, target table will have all columns, the columns number comparison between source table and target table is: source table column number = target table column number - static partition column number. What changes were proposed in this PR? Before do the column number comparison, remove the static partition column from target table. This Closes #4233
Configuration menu - View commit details
-
Copy full SHA for 9dbd2a5 - Browse repository at this point
Copy the full SHA 9dbd2a5View commit details
Commits on Oct 28, 2021
-
[CARBONDATA-4240]: Added missing properties on the configurations page
Why is this PR needed? Few properties which were not present on configurations page but are user facing properties have been added. What changes were proposed in this PR? Addition of missing properties Does this PR introduce any user interface change? No Is any new testcase added? No This Closes #4210
Configuration menu - View commit details
-
Copy full SHA for 7d94691 - Browse repository at this point
Copy the full SHA 7d94691View commit details -
[CARBONDATA-4194] Fixed presto read after update/delete from spark
Why is this PR needed? After update/delete with spark on the table which contains array/struct column, when we are trying to read from presto then it is throwing class cast exception. It is because when we perform update/delete then it contains vector of type ColumnarVectorWrapperDirectWithDeleteDelta which we are trying to typecast to CarbonColumnVectorImpl and because of this it is throwing typecast exception. After fixing this(added check for instanceOf) it started throwing IllegalArgumentException. It is because: 1. In case of local dictionary enable CarbondataPageSource.load is calling ComplexTypeStreamReader.putComplexObject before setting the correct number of rows(doesn't subtrat deleted rows). And it throws IllegalArgument while block building for child elements. 2. position count is wrong in the case of the struct. It should subtract the number of deleted rows in LocalDictDimensionDataChunkStore.fillVector. While this is not required to be changed in the case of the array because datalength of the array already taking care of deleted rows in ColumnVectorInfo.getUpdatedPageSizeForChildVector. What changes were proposed in this PR? First fixed class cast exception after putting instanceOf condition in if block. Then subtracted the deleted row count before calling ComplexTypeStreamReader.putComplexObject in DirectCompressCodec.decodeAndFillVector. Also handle deleted rows in case of struct in LocalDictDimensionDataChunkStore.fillVector Does this PR introduce any user interface change? No Is any new testcase added? No This Closes #4224
Configuration menu - View commit details
-
Copy full SHA for 07b41a5 - Browse repository at this point
Copy the full SHA 07b41a5View commit details
Commits on Nov 15, 2021
-
[CARBONDATA-4296]: schema evolution, enforcement and deduplication ut…
…ilities added Why is this PR needed? This PR adds schema enforcement, schema evolution and deduplication capabilities for carbondata streamer tool specifically. For the existing IUD scenarios, some work needs to be done to handle it completely, for example - 1. passing default values and storing them in table properties. Changes proposed for the phase 2 - 1. Handling delete use cases with upsert operation/command itself. Right now we consider update as delete + insert. With the new streamer tool, it is possible that user sets upsert as the operation type and incoming stream has delete records as well. What changes were proposed in this PR? Configs and utility methods are added for the following use cases - 1. Schema enforcement 2. Schema evolution - add column, delete column, data type change scenario 3. Deduplicate the incoming dataset against incoming dataset itself. This is useful in scenarios where incoming stream of data has multiple updates for the same record and we want to pick the latest. 4. Deduplicate the incoming dataset against existing target dataset. This is useful when operation type is set as INSERT and user does not want to insert duplicate records. This closes #4227
Configuration menu - View commit details
-
Copy full SHA for 3be05d2 - Browse repository at this point
Copy the full SHA 3be05d2View commit details
Commits on Nov 25, 2021
-
Supplementary information for add segment syntax .
1. add segment option (partition) 2. segment-management-on-carbondata.md link addsegment-guide.md
Configuration menu - View commit details
-
Copy full SHA for 81c2e29 - Browse repository at this point
Copy the full SHA 81c2e29View commit details
Commits on Nov 26, 2021
-
[CARBONDATA-4305] Support Carbondata Streamer tool for incremental fe…
…tch and merge from kafka and DFS Sources Why is this PR needed? In the current Carbondata CDC solution, if any user wants to integrate it with a streaming source then he need to write a separate spark application to capture changes which is an overhead. We should be able to incrementally capture the data changes from primary databases and should be able to incrementally ingest the same in the data lake so that the overall latency decreases. The former is taken care of using log-based CDC systems like Maxwell and Debezium. Here is a solution for the second aspect using Apache Carbondata. What changes were proposed in this PR? Carbondata streamer tool is a spark streaming application which enables users to incrementally ingest data from various sources, like Kafka(standard pipeline would be like MYSQL => debezium => (kafka + Schema registry) => Carbondata Streamer tool) and DFS into their data lakes. The tool comes with out-of-the-box support for almost all types of schema evolution use cases. With the streamer tool only add column support is given with drop column and other schema changes capability in line in the upcoming days. Please refer to design document for more details about usage and working of the tool. This closes #4235
Configuration menu - View commit details
-
Copy full SHA for 18840af - Browse repository at this point
Copy the full SHA 18840afView commit details
Commits on Nov 29, 2021
-
Add FAQ How to manage mix file format in carbondata table.
1. add segment example. 2. faq.md link to addsegment-guide.md
Configuration menu - View commit details
-
Copy full SHA for 598d1ce - Browse repository at this point
Copy the full SHA 598d1ceView commit details -
Configuration menu - View commit details
-
Copy full SHA for 885a21c - Browse repository at this point
Copy the full SHA 885a21cView commit details
Commits on Dec 2, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 69ab06c - Browse repository at this point
Copy the full SHA 69ab06cView commit details
Commits on Dec 4, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 7af81ad - Browse repository at this point
Copy the full SHA 7af81adView commit details
Commits on Dec 7, 2021
-
Revert "remove useless numerical value , revert some typo issues"
This reverts commit 7af81ad.
Configuration menu - View commit details
-
Copy full SHA for 42d59be - Browse repository at this point
Copy the full SHA 42d59beView commit details -
Configuration menu - View commit details
-
Copy full SHA for 580f7f6 - Browse repository at this point
Copy the full SHA 580f7f6View commit details -
Revert "FAQ: carbon rename to carbondata"
This reverts commit 885a21c.
Configuration menu - View commit details
-
Copy full SHA for 341f1bf - Browse repository at this point
Copy the full SHA 341f1bfView commit details -
Revert "Add FAQ How to manage mix file format in carbondata table."
This reverts commit 598d1ce.
Configuration menu - View commit details
-
Copy full SHA for ce5747d - Browse repository at this point
Copy the full SHA ce5747dView commit details -
Revert "Supplementary information for add segment syntax ."
This reverts commit 81c2e29.
Configuration menu - View commit details
-
Copy full SHA for f544e59 - Browse repository at this point
Copy the full SHA f544e59View commit details -
Configuration menu - View commit details
-
Copy full SHA for c29fee2 - Browse repository at this point
Copy the full SHA c29fee2View commit details
Commits on Dec 18, 2021
-
Update docs/addsegment-guide.md
ths! Co-authored-by: Indhumathi27 <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 379f5ad - Browse repository at this point
Copy the full SHA 379f5adView commit details -
Update docs/addsegment-guide.md
ths! Co-authored-by: Indhumathi27 <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a1b6d99 - Browse repository at this point
Copy the full SHA a1b6d99View commit details
Commits on Dec 20, 2021
-
Configuration menu - View commit details
-
Copy full SHA for fc3914f - Browse repository at this point
Copy the full SHA fc3914fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 861fc67 - Browse repository at this point
Copy the full SHA 861fc67View commit details -
Configuration menu - View commit details
-
Copy full SHA for 01f8e1a - Browse repository at this point
Copy the full SHA 01f8e1aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 053d080 - Browse repository at this point
Copy the full SHA 053d080View commit details -
Configuration menu - View commit details
-
Copy full SHA for c0211fc - Browse repository at this point
Copy the full SHA c0211fcView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0ced3c8 - Browse repository at this point
Copy the full SHA 0ced3c8View commit details -
Configuration menu - View commit details
-
Copy full SHA for f266a73 - Browse repository at this point
Copy the full SHA f266a73View commit details
Commits on Dec 22, 2021
-
[CARBONDATA-4316]Fix horizontal compaction failure for partition tables
Why is this PR needed? Horizontal compaction fails for partition table leading to many delete delta files for a single block, leading to slower query performance. This is happening because during horizontal compaction the delta file path prepared for the partition table is wrong which fails to identify the path and fails the operation. What changes were proposed in this PR? If it is a partition table, read the segment file and identity the partition where the block is present to prepare a proper partition path. This closes #4240
Configuration menu - View commit details
-
Copy full SHA for d629dc0 - Browse repository at this point
Copy the full SHA d629dc0View commit details -
[CARBONDATA-4317] Fix TPCDS performance issues
Why is this PR needed? The following issues has degraded the TPCDS query performance 1. If dynamic filters is not present in partitionFilters Set, then that filter is skipped, to pushdown to spark. 2. In some cases, some nodes like Exchange / Shuffle is not reused, because the CarbonDataSourceSCan plan is not mached 3. While accessing the metadata on the canonicalized plan throws NPE What changes were proposed in this PR? 1. Check if dynamic filters is present in PartitionFilters set. If not, pushdown the filter 2. Match the plans, by converting them to canonicalized and by normalising the expressions 3. Move variables used in metadata(), to avoid NPE while comparing plans This closes #4241
Configuration menu - View commit details
-
Copy full SHA for 0f1d2a4 - Browse repository at this point
Copy the full SHA 0f1d2a4View commit details
Commits on Dec 28, 2021
-
[CARBONDATA-4319] Fixed clean files not deleteting stale delete delta…
… files after horizontal compaction Why is this PR needed? After horizontal compaction was performed on partition and non partition tables, the clean files operation was not deleting the stale delete delta files. the code was removed as the part of clean files refactoring done previously. What changes were proposed in this PR? Clean files with force option now handles removal of these stale delta files as well as the stale tableupdatestatus file for both partition and non partition table. This closes #4245
Configuration menu - View commit details
-
Copy full SHA for a072e7a - Browse repository at this point
Copy the full SHA a072e7aView commit details -
[CARBONDATA-4308]: added docs for streamer tool configs
Why is this PR needed? Documentation for the CDC streamer tool is missing What changes were proposed in this PR? Add th documentation for the cdc streamer tool contains the configs and all the image and example command to try out. Does this PR introduce any user interface change? No Is any new testcase added? No This closes #4243
Configuration menu - View commit details
-
Copy full SHA for 970f11d - Browse repository at this point
Copy the full SHA 970f11dView commit details
Commits on Dec 29, 2021
-
[CARBONDATA-4318]Improve load overwrite performance for partition tables
Why is this PR needed? With the increase in the number of overwrite loads for the partition table, the time takes for each load keeps on increasing over time. This is because, 1. whenever a load overwrite for partition table is fired, it basically means that we need to overwrite or drop the partitions if anything overlaps with current partitions getting loaded. Since carbondata stores the partition information in the segments file, to identify and drop partitions, it's reading all the previous segment files to identify and drop the overwriting partitions, which leads to a decrease in performance. 2. After partition load is completed, a cleanSegments method is called which again reads segment file and table status file to identify MArked for delete segments to clean. But Since the force clean is false and timeout also is more than a day by default, it's not necessary to call this method. Clean files should handle this part. What changes were proposed in this PR? 1. we already have the information about current partitions, so with that first identify if there are any partitions to overwrite, if yes then only we read segment files to call dropParitition, else we don't read the segment files unnecessarily. It also contains other refactoring to avoid reading table status file also. 2. no need to call clean segments after every load. Clean files will take care to delete the expired ones. This closes #4242
Configuration menu - View commit details
-
Copy full SHA for 308906e - Browse repository at this point
Copy the full SHA 308906eView commit details
Commits on Jan 13, 2022
-
[CARBONDATA-4320] Fix clean files removing wrong delta files
Why is this PR needed? In the case where there are multiple delete delta files in a partition in a partition table, some delta files were being ignored and deleted, thus changing the value during the query What changes were proposed in this PR? Fixed the logic which checks which delta file to delete. Now checking the deltaStartTime and comparing it with deltaEndTime to check consider all the delta files during clean files. Does this PR introduce any user interface change? No Is any new testcase added? Yes, one test case has been added. This closes #4246
Configuration menu - View commit details
-
Copy full SHA for 05aff87 - Browse repository at this point
Copy the full SHA 05aff87View commit details
Commits on Feb 14, 2022
-
[CARBONDATA-4322] Apply local sort task level property for insert
Why is this PR needed? Currently, When carbon.partition.data.on.tasklevel is enabled with local sort, the number of tasks launched for load will be based on node locality. But for insert command, the local sort task level property is not applied which is causing the number of tasks launched based on the input files. What changes were proposed in this PR? Included changes to apply carbon.partition.data.on.tasklevel property for insert command as well. Used DataLoadCoalescedRDD to coalesce the partitions and a DataLoadCoalescedUnwrapRDDto unwrap partitions from DataLoadPartitionWrap and iterate. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4248
Configuration menu - View commit details
-
Copy full SHA for 59f23c0 - Browse repository at this point
Copy the full SHA 59f23c0View commit details
Commits on Mar 4, 2022
-
[CARBONDATA-4325] Update Data frame supported options in document and…
… fix partition table creation with df spatial property Why is this PR needed? 1. Only specific properties are supported using dataframe options. Need to update the documentation. 2. Create partition table fails with Spatial index property for carbon table created with dataframe in spark-shell. What changes were proposed in this PR? 1. Added data frame supported properties in the documentation. 2. Using spark-shell, the table gets created with carbon session and catalogTable.properties is empty here. Getting the properties from catalogTable.storage.properties to access the properties set. Does this PR introduce any user interface change? No Is any new testcase added? No, tested in cluster. This closes #4250
Configuration menu - View commit details
-
Copy full SHA for c840b5f - Browse repository at this point
Copy the full SHA c840b5fView commit details -
[CARBONDATA-4326] MV not hitting with multiple sessions issue fix
Why is this PR needed? MV created in beeline not hitting in sql/shell and vice versa if both beeline and sql/shell are running in parallel. Currently, If the view catalog for a particular session is already initialized then the schemas are not reloaded each time. So when mv is created in another session and queried from the currently open session, mv is not hit. What changes were proposed in this PR? 1.Reload mv catalog every time to getSchemas from the path. Register the schema if not present in the catalog and deregister the schema if it's dropped. 2. When create SI is triggered, no need to try rewriting the plan and check for mv schemas. So, returning plan if DeserializeToObject is present. Does this PR introduce any user interface change? No Is any new testcase added? No, tested in cluster This closes #4251
Configuration menu - View commit details
-
Copy full SHA for 19343a7 - Browse repository at this point
Copy the full SHA 19343a7View commit details
Commits on Mar 7, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 9b74951 - Browse repository at this point
Copy the full SHA 9b74951View commit details -
Configuration menu - View commit details
-
Copy full SHA for e25d5b6 - Browse repository at this point
Copy the full SHA e25d5b6View commit details -
[CARBONDATA-4306] Fix Query Performance issue for Spark 3.1
Why is this PR needed? Some non-partition filters, which cannot be handled by carbon, is not pushed down to spark. What changes were proposed in this PR? If partition filters is non empty, then the filter column is not partition column, then push the filter to spark This closes #4252
Configuration menu - View commit details
-
Copy full SHA for a838531 - Browse repository at this point
Copy the full SHA a838531View commit details
Commits on Mar 18, 2022
-
[CARBONDATA-4327] Update documentation related to partition
Why is this PR needed? Drop partition with data is not supported and a few of the links are not working. What changes were proposed in this PR? Removed unsupported syntax , duplicate headings and updated the header with proper linkage. This closes #4254
Configuration menu - View commit details
-
Copy full SHA for 41831ce - Browse repository at this point
Copy the full SHA 41831ceView commit details
Commits on Mar 29, 2022
-
[CARBONDATA-4328] Load parquet table with options error message fix
Why is this PR needed? If parquet table is created and load statement with options is triggerred, then its failing with NoSuchTableException: Table ${tableIdentifier.table} does not exist. What changes were proposed in this PR? As parquet table load is not handled, added a check to filter out non-carbon tables in the parser. So that, the spark parser can handle the statement. This closes #4253
Configuration menu - View commit details
-
Copy full SHA for d6ce946 - Browse repository at this point
Copy the full SHA d6ce946View commit details
Commits on Apr 1, 2022
-
[CARBONDATA-4329] Fix multiple issues with External table
Why is this PR needed? Issue 1: When we create external table on transactional table location, schema file will be present. While creating external table, which is also transactional, the schema file is overwritten Issue 2: If external table is created on a location, where the source table already exists, on drop external table, it is deleting the table data. Query on the source table fails What changes were proposed in this PR? Avoid writing schema file if table type is external and transactional Dont drop external table location data, if table_type is external This closes #4255
Configuration menu - View commit details
-
Copy full SHA for 46b62cf - Browse repository at this point
Copy the full SHA 46b62cfView commit details
Commits on Apr 28, 2022
-
[CARBONDATA-4330] Incremental Dataload of Average aggregate in MV
Why is this PR needed? Currently, whenever MV is created with average aggregate, a full refresh is done meaning it reloads the whole MV for any newly added segments. This will slow down the loading. With incremental data load, only the segments that are newly added can be loaded to the MV. What changes were proposed in this PR? If avg is present, rewrite the query with the sum and count of the columns to create MV and use them to derive avg. Refer: https://docs.google.com/document/d/1kPEMCX50FLZcmyzm6kcIQtUH9KXWDIqh-Hco7NkTp80/edit Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4257
Configuration menu - View commit details
-
Copy full SHA for 45acd67 - Browse repository at this point
Copy the full SHA 45acd67View commit details
Commits on May 27, 2022
-
[CARBONDATA-4336] Table Status Versioning
Why is this PR needed? Currently, carbondata will store the records of a transaction (load/insert/IUD/Add/drop segment) in a metadata file named `tablestatus’ which will be present in the Metadata directory. If the tablestatus file is lost, then the metadata for the transactions cannot be recovered directly, as there is no previous version file available for tablestatus. Hence, if we support versioning for tablestatus files, then it will be easy to recover the current version tablestatus meta from previous version tablestatus files. Please refer Table Status Versioning & Recovery Tool for more info. What changes were proposed in this PR? -> On each transaction commit, committed the latest load metadata details to a new version file -> Updated the latest tablestatus version timestamp in the table properties [CarbonTable cache] and in the hive metastore -> Added a table status version tool which can recover the latest transaction details based on old version file Does this PR introduce any user interface change? Yes Is any new testcase added? Yes This closes #4261
Configuration menu - View commit details
-
Copy full SHA for 57e76ee - Browse repository at this point
Copy the full SHA 57e76eeView commit details
Commits on Jun 2, 2022
-
[CARBONDATA-4335] Disable MV by default
Why is this PR needed? Currently materialized view(mv) is enabled by default. In concurrent scenarios with default mv enabled each session is going through the list of databases even though mv not used. Due to this query time increased. What changes were proposed in this PR? Disable mv by default as users using mv rarely. If user required then enable and use it. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4264
Configuration menu - View commit details
-
Copy full SHA for 33408be - Browse repository at this point
Copy the full SHA 33408beView commit details
Commits on Jun 22, 2022
-
[CARBONDATA-4341] Drop Index Fails after TABLE RENAME
Why is this PR needed? Drop Index Fails after TABLE RENAME What changes were proposed in this PR? After table rename, its si tables property - parentTableName is updated with latest name and index metadata gets updated. Dropping the table from the metadata cache so that it would be reloaded and gets updated property when fetched next time. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4279
Configuration menu - View commit details
-
Copy full SHA for 4b8846d - Browse repository at this point
Copy the full SHA 4b8846dView commit details
Commits on Jun 23, 2022
-
[CARBONDATA-4339]Fix NullPointerException in load overwrite on partit…
…ion table Why is this PR needed? After delete segment and clean files with force option true, the load overwrite operation throws nullpointer exception. This is because when clean files with force is done, except the 0th segment and last segment remaining marked for delete will be moved to tablestatus.history file irrespective of the status of the 0th and last segment. During overwrite load, the overwritten partition will be dropped. Since all the segments are physically deleted with clean files, and load model's load metadata details list contains 0th segment which is marked for delete also leading to failure. What changes were proposed in this PR? When the valid segments are collected, filter using the segment's status to avoid the failure. This closes #4280
Configuration menu - View commit details
-
Copy full SHA for 93b0af2 - Browse repository at this point
Copy the full SHA 93b0af2View commit details
Commits on Jun 27, 2022
-
[CARBONDATA-4344] Create MV fails with "LOCAL_DICTIONARY_INCLUDE/LOCA…
…L _DICTIONARY_EXCLUDE column: does not exist in table. Please check the DDL" error Why is this PR needed? Create MV fails with "LOCAL_DICTIONARY_INCLUDE/LOCAL _DICTIONARY_EXCLUDE column: does not exist in table. Please check the DDL" error. Error occurs only in this scenario: Create Table --> Load --> Alter Add Columns --> Drop table --> Refresh Table --> Create MV and not in direct scenario like: Create Table --> Load --> Alter Add Columns --> Create MV What changes were proposed in this PR? 1. After add column command, LOCAL_DICTIONARY_INCLUDE and LOCAL_DICTIONARY_EXCLUDE properties are added to the table even if the columns are empty. So, when MV is created next as LOCAL_DICTIONARY_EXCLUDE column is defined it tries to access its columns and fails. --> Added empty check before adding properties to the table to resolve this. 2. In a direct scenario after add column, the schema gets updated in catalog table but the table properties are not updated. Made changes to update table properties to catalog table. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4282
Configuration menu - View commit details
-
Copy full SHA for 858afc7 - Browse repository at this point
Copy the full SHA 858afc7View commit details
Commits on Jul 1, 2022
-
[CARBONDATA-4345] update/delete operations failed when other format s…
…egemnt deleted from carbon table Why is this PR needed? Update/delete operations failed when other format segments deleted from carbon table Steps to reproduce: 1. create carbon table and load the data 2. create parquet/orc tables and load the data 3. add parquet/orc format segments in carbon table by alter add segment command 4. perform update/delete operations in carbon table and they will fail as table contains mixed format segments. This is expected behaviour only. 5. delete the other format segments which is added in step3 6. try to perform update/delete operation in carbon data. They should not fail For update/delete operations we are checking if other format segments present in table path. If found then carbon data throwing exception by saying mixed format segments exists even though the other format segments deleted from table. What changes were proposed in this PR? When we are checking other format segment present in carbon table then it should check only for SUCCESS/PARTIAL_SUCCESS segments. Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4285
Configuration menu - View commit details
-
Copy full SHA for b8511b6 - Browse repository at this point
Copy the full SHA b8511b6View commit details
Commits on Jul 11, 2022
-
[CARBONDATA-4342] Fix Desc Columns shows New Column added, even thoug…
…h ALter ADD column query failed Why is this PR needed? 1. When spark.carbon.hive.schema.store property is enabled, alter operations fails with Class Cast Exception. 2. When Alter add/drop/rename column operation failed due to the issue mentioned above, the revert schema operation is not reverting back to the old schema What changes were proposed in this PR? 1. Use org.apache.spark.sql.hive.CarbonSessionCatalogUtil#getClient to get HiveClient to avoid ClassCast Exception 2. Revert the schema in the spark Catalog table also, in case of failure Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4277
Configuration menu - View commit details
-
Copy full SHA for 8691cb7 - Browse repository at this point
Copy the full SHA 8691cb7View commit details
Commits on Jul 19, 2022
-
[CARBONDATA-4338] Moving dropped partition data to trash
Why is this PR needed? When drop partition operation is performed carbon data will modify only table status file and can not delete the actual partition folder which contains data and index files. As comply with hive behaviour carbon data also should delete the deleted partition folder in storage[hdfs/obs/etc..]. Before deleting carbon data will keep copy in Trash folder. User can restore it by checking the partition name and time stamp. What changes were proposed in this PR? Moved the deleted partition folder files to trash folder Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #4276
Configuration menu - View commit details
-
Copy full SHA for 04b1756 - Browse repository at this point
Copy the full SHA 04b1756View commit details
Commits on Apr 8, 2023
-
Configuration menu - View commit details
-
Copy full SHA for b690cf2 - Browse repository at this point
Copy the full SHA b690cf2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8e9ffd5 - Browse repository at this point
Copy the full SHA 8e9ffd5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8b8345a - Browse repository at this point
Copy the full SHA 8b8345aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2f0241d - Browse repository at this point
Copy the full SHA 2f0241dView commit details -
Configuration menu - View commit details
-
Copy full SHA for c780221 - Browse repository at this point
Copy the full SHA c780221View commit details -
Merge pull request #4300 from xubo245/issue-4298
[ISSUE-4298] Fixed mail list issue
Configuration menu - View commit details
-
Copy full SHA for 4af8af4 - Browse repository at this point
Copy the full SHA 4af8af4View commit details -
Merge pull request #4301 from xubo245/issue-4299
[ISSUE-4299] Fixed compile issue with spark 2.3
Configuration menu - View commit details
-
Copy full SHA for 92f4dff - Browse repository at this point
Copy the full SHA 92f4dffView commit details
Commits on Apr 9, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 3cc0367 - Browse repository at this point
Copy the full SHA 3cc0367View commit details -
Merge pull request #4307 from xubo245/ISSUE-4305-magicNumber
[ISSUE-4305] Optimize the magic number
Configuration menu - View commit details
-
Copy full SHA for cba9a8a - Browse repository at this point
Copy the full SHA cba9a8aView commit details -
Configuration menu - View commit details
-
Copy full SHA for f92ae07 - Browse repository at this point
Copy the full SHA f92ae07View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9d43c78 - Browse repository at this point
Copy the full SHA 9d43c78View commit details
Commits on Apr 10, 2023
-
[ISSUE-4306] Fix the error of SDKS3SchemaReadExample (#4312)
Fix the issue when read schema from S3
Configuration menu - View commit details
-
Copy full SHA for 01dd526 - Browse repository at this point
Copy the full SHA 01dd526View commit details -
Configuration menu - View commit details
-
Copy full SHA for 44c2bca - Browse repository at this point
Copy the full SHA 44c2bcaView commit details
Commits on Apr 13, 2023
-
Configuration menu - View commit details
-
Copy full SHA for b941983 - Browse repository at this point
Copy the full SHA b941983View commit details
Commits on Apr 24, 2023
-
[ISSUE-4305] Optimize the usage of static method (#4309)
static method shouldn't be called by object, it should be call by class
Configuration menu - View commit details
-
Copy full SHA for 2439589 - Browse repository at this point
Copy the full SHA 2439589View commit details
Commits on Jun 8, 2023
-
Configuration menu - View commit details
-
Copy full SHA for f31edd6 - Browse repository at this point
Copy the full SHA f31edd6View commit details
Commits on Jun 26, 2023
-
Add new example:Using CarbonData to visualization in notebook (#4318)
* Add new example:Using CarbonData to visualization in notebook * Update the example:Using CarbonData in notebook
Configuration menu - View commit details
-
Copy full SHA for 208afe9 - Browse repository at this point
Copy the full SHA 208afe9View commit details -
[ISSUE-4305] Optimize the constants and variable style (#4311)
CONSTANTS should be like: UPPER_NAME; variable should be lowerCamelCase
Configuration menu - View commit details
-
Copy full SHA for 6e031c0 - Browse repository at this point
Copy the full SHA 6e031c0View commit details
Commits on Jul 8, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 8264b3b - Browse repository at this point
Copy the full SHA 8264b3bView commit details
Commits on Aug 20, 2023
-
Configuration menu - View commit details
-
Copy full SHA for cd180c9 - Browse repository at this point
Copy the full SHA cd180c9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 95b50e8 - Browse repository at this point
Copy the full SHA 95b50e8View commit details
Commits on Oct 1, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 13a2c97 - Browse repository at this point
Copy the full SHA 13a2c97View commit details -
[ISSUE-4329] optimize some code smells in presto module (#4330)
static method shouldn't be called by object, it should be call by class
Configuration menu - View commit details
-
Copy full SHA for beb426c - Browse repository at this point
Copy the full SHA beb426cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9f604fc - Browse repository at this point
Copy the full SHA 9f604fcView commit details -
Configuration menu - View commit details
-
Copy full SHA for 95a6407 - Browse repository at this point
Copy the full SHA 95a6407View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4462461 - Browse repository at this point
Copy the full SHA 4462461View commit details -
Configuration menu - View commit details
-
Copy full SHA for af9c6c3 - Browse repository at this point
Copy the full SHA af9c6c3View commit details -
Configuration menu - View commit details
-
Copy full SHA for d499699 - Browse repository at this point
Copy the full SHA d499699View commit details -
Configuration menu - View commit details
-
Copy full SHA for bcb30a5 - Browse repository at this point
Copy the full SHA bcb30a5View commit details
Commits on Oct 10, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 30c1aa8 - Browse repository at this point
Copy the full SHA 30c1aa8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 84cfd20 - Browse repository at this point
Copy the full SHA 84cfd20View commit details -
Configuration menu - View commit details
-
Copy full SHA for 504a5ae - Browse repository at this point
Copy the full SHA 504a5aeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 66cb3a3 - Browse repository at this point
Copy the full SHA 66cb3a3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 13ac2ef - Browse repository at this point
Copy the full SHA 13ac2efView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0e0523e - Browse repository at this point
Copy the full SHA 0e0523eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0cae3d1 - Browse repository at this point
Copy the full SHA 0cae3d1View commit details -
Configuration menu - View commit details
-
Copy full SHA for fd66031 - Browse repository at this point
Copy the full SHA fd66031View commit details -
Configuration menu - View commit details
-
Copy full SHA for 38fdb16 - Browse repository at this point
Copy the full SHA 38fdb16View commit details
Commits on Oct 17, 2023
-
Configuration menu - View commit details
-
Copy full SHA for ebe4101 - Browse repository at this point
Copy the full SHA ebe4101View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4618808 - Browse repository at this point
Copy the full SHA 4618808View commit details
Commits on Oct 19, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 448564a - Browse repository at this point
Copy the full SHA 448564aView commit details -
Configuration menu - View commit details
-
Copy full SHA for f18846c - Browse repository at this point
Copy the full SHA f18846cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 39dd8ce - Browse repository at this point
Copy the full SHA 39dd8ceView commit details
Commits on Nov 5, 2023
-
Configuration menu - View commit details
-
Copy full SHA for dd74408 - Browse repository at this point
Copy the full SHA dd74408View commit details -
#117, The longitude is six decimal places and the dimension is five digits. Why is it the same length after conversion?
Configuration menu - View commit details
-
Copy full SHA for 1e327f2 - Browse repository at this point
Copy the full SHA 1e327f2View commit details
Commits on Nov 6, 2023
-
Co-authored-by: QiangCai <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 57de4a3 - Browse repository at this point
Copy the full SHA 57de4a3View commit details
Commits on Nov 8, 2023
-
[ISSUE-4338] Fix checkstyle issue in sdk module (#4339)
Co-authored-by: QiangCai <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4a1b36f - Browse repository at this point
Copy the full SHA 4a1b36fView commit details
Commits on Nov 11, 2023
-
[CARBONDATA-4333][Doc] Update the declaration of supported String dat…
…a types (#4263) Why is this PR needed? CHAR and VARCHAR as String data types are no longer supported in Carbon. They should be deleted from doc's desc. What changes were proposed in this PR? CHAR and VARCHAR stop appearing as two String data types in doc. Does this PR introduce any user interface change? No Is any new testcase added? No Co-authored-by: tangchuan <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7abc7cd - Browse repository at this point
Copy the full SHA 7abc7cdView commit details -
Configuration menu - View commit details
-
Copy full SHA for 53d3370 - Browse repository at this point
Copy the full SHA 53d3370View commit details -
Configuration menu - View commit details
-
Copy full SHA for 64ecd77 - Browse repository at this point
Copy the full SHA 64ecd77View commit details -
Configuration menu - View commit details
-
Copy full SHA for 48f5976 - Browse repository at this point
Copy the full SHA 48f5976View commit details
Commits on Nov 19, 2023
-
[ISSUE-4342] Fix test case errors (#4343)
Co-authored-by: QiangCai <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7195869 - Browse repository at this point
Copy the full SHA 7195869View commit details -
Configuration menu - View commit details
-
Copy full SHA for d326118 - Browse repository at this point
Copy the full SHA d326118View commit details -
Configuration menu - View commit details
-
Copy full SHA for a6e9e37 - Browse repository at this point
Copy the full SHA a6e9e37View commit details
Commits on Dec 2, 2023
-
Configuration menu - View commit details
-
Copy full SHA for bcc7137 - Browse repository at this point
Copy the full SHA bcc7137View commit details
Commits on Dec 9, 2023
-
Bump pyarrow from 0.11.1 to 14.0.1 in /python (#4341)
Bumps [pyarrow](https://github.com/apache/arrow) from 0.11.1 to 14.0.1. - [Commits](apache/arrow@apache-arrow-0.11.1...go/v14.0.1) --- updated-dependencies: - dependency-name: pyarrow dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for dee20b8 - Browse repository at this point
Copy the full SHA dee20b8View commit details
Commits on Mar 16, 2024
-
Configuration menu - View commit details
-
Copy full SHA for c1b0e9c - Browse repository at this point
Copy the full SHA c1b0e9cView commit details
Commits on Mar 22, 2024
-
Minor refactor the build/README.md (#4349)
* Minor refactor the build docs * Fix review comments * Update build/README.md
Configuration menu - View commit details
-
Copy full SHA for 74e6e93 - Browse repository at this point
Copy the full SHA 74e6e93View commit details
Commits on Apr 6, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 71abab0 - Browse repository at this point
Copy the full SHA 71abab0View commit details
Commits on Jun 30, 2024
-
Co-authored-by: jacky <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5ff36b6 - Browse repository at this point
Copy the full SHA 5ff36b6View commit details -
[CARBONDATA-4349] Upgrade thrift version (#4355)
* upgrade thrift version * change to use 0.20.0 --------- Co-authored-by: jacky <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f370d20 - Browse repository at this point
Copy the full SHA f370d20View commit details
Commits on Jul 6, 2024
-
Bump org.apache.commons:commons-compress in /integration/presto (#4345)
Bumps org.apache.commons:commons-compress from 1.4.1 to 1.26.0. --- updated-dependencies: - dependency-name: org.apache.commons:commons-compress dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 8f9ce4e - Browse repository at this point
Copy the full SHA 8f9ce4eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2c78847 - Browse repository at this point
Copy the full SHA 2c78847View commit details -
Configuration menu - View commit details
-
Copy full SHA for 29607c3 - Browse repository at this point
Copy the full SHA 29607c3View commit details
Commits on Oct 5, 2024
-
Configuration menu - View commit details
-
Copy full SHA for e0ac69a - Browse repository at this point
Copy the full SHA e0ac69aView commit details