Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(csharp/src/Drivers): introduce drivers for Apache systems built on Thrift #1710

Merged
merged 98 commits into from
Apr 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
98 commits
Select commit Hold shift + click to select a range
b479cd0
add Apache drivers; re-organize Flight SQL
Aug 22, 2023
0116e21
working on getobjects
Sep 21, 2023
e14559f
merge
Nov 7, 2023
f4899bf
Merge ssh://github.com/davidhcoe/arrow-adbc into dev/introdrivers
Jan 25, 2024
0cc6ee6
Merge branch 'apache:main' into dev/introdrivers
davidhcoe Jan 25, 2024
2757e5c
Merge branch 'dev/introdrivers' of ssh://github.com/davidhcoe/arrow-a…
Jan 25, 2024
e5af9f6
include Apache drivers
Jan 25, 2024
ac9187f
Merge ssh://github.com/davidhcoe/arrow-adbc into dev/introdrivers
Jan 27, 2024
fb7035a
update after latest merge
Jan 27, 2024
45fd0c7
Merge ssh://github.com/davidhcoe/arrow-adbc into dev/introdrivers
Feb 27, 2024
9084a13
update to latest
Feb 27, 2024
666d012
updating tests
Mar 2, 2024
6652330
Added implementation for table schema
vikrantpuppala Mar 14, 2024
df4fdec
use columns for data
vikrantpuppala Mar 14, 2024
56ac604
Adding getObjects impl
gopalldb Mar 14, 2024
964d279
Merge pull request #1 from gopalldb/vp-hack
vikrantpuppala Mar 14, 2024
96448f7
Merge conflicts
gopalldb Mar 14, 2024
d2fb852
merge conflicts
gopalldb Mar 14, 2024
3a3a224
Merge pull request #2 from gopalldb/list-objects
gopalldb Mar 14, 2024
b927c31
Adding back vikrant's changes
gopalldb Mar 14, 2024
20e81e5
Merge pull request #3 from gopalldb/merge2
gopalldb Mar 14, 2024
8f0803f
Fixing compile error
gopalldb Mar 14, 2024
d0e8158
Merge pull request #4 from gopalldb/merge2
gopalldb Mar 14, 2024
4313cc2
make unit test working
jadewang-db Mar 14, 2024
5701c04
Add databricks.md for testing from Yunbo
yunbodeng-db Mar 14, 2024
ea46162
Merge branch 'dev/apache-drivers' of github.com:gopalldb/arrow-adbc i…
yunbodeng-db Mar 14, 2024
b34ff8a
Merge branch 'apache:main' into dev/apache-drivers
davidhcoe Mar 15, 2024
f7ca693
Revert "Merge branch 'dev/apache-drivers' of github.com:gopalldb/arro…
jadewang-db Mar 15, 2024
1949057
Revert "Revert "Merge branch 'dev/apache-drivers' of github.com:gopal…
jadewang-db Mar 15, 2024
84b39b6
initial skeleton for cloud fetch
gopalldb Mar 15, 2024
590b920
Merge pull request #5 from gopalldb/cloud
gopalldb Mar 15, 2024
6d566b3
add fetch results and batch traversing for cloud fetch
vikrantpuppala Mar 15, 2024
9579fdb
cleanup
vikrantpuppala Mar 15, 2024
4051089
cleanup
vikrantpuppala Mar 15, 2024
6b46336
init httpclient
vikrantpuppala Mar 15, 2024
08743eb
Merge pull request #6 from gopalldb/vp-hack-1503
vikrantpuppala Mar 15, 2024
af278ce
Merge pull request #4 from gopalldb/apach-driver-jade
davidhcoe Mar 15, 2024
d5bdb76
First working integration
jadewang-db Mar 16, 2024
609238a
ci: work around ASAN issue (#1618)
lidavidm Mar 15, 2024
9ae24cf
Update SparkConnection.cs
jadewang-db Mar 18, 2024
5038c9c
Make changes for cloud fetch
vikrantpuppala Mar 18, 2024
0551db7
Merge pull request #7 from gopalldb/vp-hack-1803
vikrantpuppala Mar 18, 2024
6212c7e
Merge branch 'dev/apache-drivers' of ssh://github.com/gopalldb/arrow-…
Mar 18, 2024
58f0246
Merge branch 'gopalldb-dev/apache-drivers' into dev/apache-drivers
Mar 18, 2024
3aa6320
Merge branch 'apache:main' into dev/apache-drivers
davidhcoe Mar 18, 2024
752638d
Merge branch 'apache:main' into dev/apache-drivers
davidhcoe Mar 18, 2024
0b1bbdf
Merge ssh://github.com/davidhcoe/arrow-adbc into dev/apache-drivers
Mar 18, 2024
df422aa
Implements ExecuteUpdate.
birschick-bq Mar 20, 2024
9373f97
Merge branch 'main' into dev/birschick-bq/execute-update
birschick-bq Mar 20, 2024
36eaba2
Correct line endings
birschick-bq Mar 20, 2024
c419b85
Correct line endings and add licenses
birschick-bq Mar 20, 2024
d74a392
Merge branch 'main' into dev/apache-drivers
birschick-bq Mar 20, 2024
4de2e1b
Merge branch 'dev/apache-drivers' into dev/birschick-bq/execute-update
birschick-bq Mar 20, 2024
d34c5ba
Correct affected rows behaviour. Add DELETE statement.
birschick-bq Mar 20, 2024
ad70304
Added more data types and UPDATE statement.
birschick-bq Mar 20, 2024
9e2f5a0
Removed empty lines.
birschick-bq Mar 20, 2024
295694c
Revert frameworks to new472;net6.0.
birschick-bq Mar 21, 2024
2e4dbd1
Fixed trailing spaces and removed unused file.
birschick-bq Mar 21, 2024
515c4c2
Fixed line ending.
birschick-bq Mar 21, 2024
7fda295
Resolve warning with `#nullable` annotation warning.
birschick-bq Mar 21, 2024
20f6059
Remove '#nullable' annotation.
birschick-bq Mar 21, 2024
dce233a
Removed unnecessary modification.
birschick-bq Mar 21, 2024
865cdb8
Some style reformatting.
birschick-bq Mar 21, 2024
6dca25c
Merge pull request #7 from davidhcoe/dev/birschick-bq/execute-update
davidhcoe Mar 21, 2024
995e1d0
Merge branch 'dev/apache-drivers' of ssh://github.com/davidhcoe/arrow…
Mar 21, 2024
c0101ba
Correct SparkConnection.GetTableSchema to use native column type iden…
birschick-bq Apr 3, 2024
d5fb190
Some formatting/style improvements.
birschick-bq Apr 3, 2024
9101a16
Merge pull request #8 from davidhcoe/dev/birschick-bq/get-table-schema
davidhcoe Apr 4, 2024
10ca62f
Merge branch 'dev/apache-drivers' of ssh://github.com/davidhcoe/arrow…
Apr 4, 2024
23e824f
Merge ssh://github.com/davidhcoe/arrow-adbc into dev/apache-drivers
Apr 4, 2024
c511876
prep for check in
Apr 4, 2024
843896b
test(csharp/test): add tests for numeric values using the Spark ADBC …
birschick-bq Apr 6, 2024
8dd0749
Merge pull request #9 from davidhcoe/dev/birschick-bq/value-tests
davidhcoe Apr 8, 2024
6009b4c
Merge branch 'apache:main' into dev/apache-drivers
davidhcoe Apr 8, 2024
436b189
add to .gitignore
Apr 8, 2024
698665a
Merge branch 'dev/apache-drivers' of ssh://github.com/davidhcoe/arrow…
Apr 8, 2024
302fc98
fix gitignore
Apr 8, 2024
79d5cd9
fix gitignore
Apr 8, 2024
9d11fdc
fixes from pre-commit
Apr 8, 2024
60de1c0
Merge ssh://github.com/davidhcoe/arrow-adbc into dev/apache-drivers
Apr 9, 2024
cc985a3
attempting to fix PR check in issues
Apr 10, 2024
82e5dab
fix line endings on notice, license
Apr 10, 2024
19e8c8f
WIP: Complex types are not working as expected, yet.
birschick-bq Apr 10, 2024
cbc524d
Merge branch 'dev/apache-drivers' into dev/birschick-bq/timestamp-val…
birschick-bq Apr 10, 2024
f2cdd8e
Corrected line endings.
birschick-bq Apr 10, 2024
15b7dae
Corrected line endings.
birschick-bq Apr 10, 2024
1618429
Added tests for string/character values
birschick-bq Apr 11, 2024
8eda11f
* Corrected handling of null and double/float values
birschick-bq Apr 11, 2024
c6c5407
corrected line endings.
birschick-bq Apr 11, 2024
f3d97f6
Set option to return string for complex types
birschick-bq Apr 12, 2024
f39a597
Merge pull request #10 from davidhcoe/dev/birschick-bq/timestamp-valu…
davidhcoe Apr 14, 2024
2eb1142
Merge ssh://github.com/davidhcoe/arrow-adbc into dev/apache-drivers
Apr 16, 2024
a118de1
PR feedback
Apr 16, 2024
7fa0a16
add more details to readme
Apr 16, 2024
0e7aa50
feat(csharp/src/Drivers/Apache): code review improvements (#12)
birschick-bq Apr 16, 2024
58d2d23
Document unsupported Impala driver.
birschick-bq Apr 16, 2024
8373c96
Added comment for supporting only little-endian platforms.
birschick-bq Apr 16, 2024
226343a
Remove implementation of HiveServer2Connection.GetObjects.
birschick-bq Apr 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ python/doc/
# Egg metadata
*.egg-info

.vs/
.vscode
.idea/
.pytest_cache/
Expand Down
14 changes: 14 additions & 0 deletions csharp/Apache.Arrow.Adbc.sln
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Apache.Arrow.Adbc.Drivers.I
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Apache.Arrow.Adbc.Tests.Drivers.Interop.Snowflake", "test\Drivers\Interop\Snowflake\Apache.Arrow.Adbc.Tests.Drivers.Interop.Snowflake.csproj", "{8BE1EECC-3ACF-41B2-AF7D-1A67196FF6C7}"
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Apache.Arrow.Adbc.Drivers.Apache", "src\Drivers\Apache\Apache.Arrow.Adbc.Drivers.Apache.csproj", "{6C0D8BE1-4A23-4C2F-88B1-D2FBEA0B1903}"
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Apache.Arrow.Adbc.Tests.Drivers.Apache", "test\Drivers\Apache\Apache.Arrow.Adbc.Tests.Drivers.Apache.csproj", "{714F0BD2-3A92-4D1A-8FAC-D0C0599BE3E3}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Expand Down Expand Up @@ -70,6 +74,14 @@ Global
{8BE1EECC-3ACF-41B2-AF7D-1A67196FF6C7}.Debug|Any CPU.Build.0 = Debug|Any CPU
{8BE1EECC-3ACF-41B2-AF7D-1A67196FF6C7}.Release|Any CPU.ActiveCfg = Release|Any CPU
{8BE1EECC-3ACF-41B2-AF7D-1A67196FF6C7}.Release|Any CPU.Build.0 = Release|Any CPU
{6C0D8BE1-4A23-4C2F-88B1-D2FBEA0B1903}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{6C0D8BE1-4A23-4C2F-88B1-D2FBEA0B1903}.Debug|Any CPU.Build.0 = Debug|Any CPU
{6C0D8BE1-4A23-4C2F-88B1-D2FBEA0B1903}.Release|Any CPU.ActiveCfg = Release|Any CPU
{6C0D8BE1-4A23-4C2F-88B1-D2FBEA0B1903}.Release|Any CPU.Build.0 = Release|Any CPU
{714F0BD2-3A92-4D1A-8FAC-D0C0599BE3E3}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{714F0BD2-3A92-4D1A-8FAC-D0C0599BE3E3}.Debug|Any CPU.Build.0 = Debug|Any CPU
{714F0BD2-3A92-4D1A-8FAC-D0C0599BE3E3}.Release|Any CPU.ActiveCfg = Release|Any CPU
{714F0BD2-3A92-4D1A-8FAC-D0C0599BE3E3}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
Expand All @@ -84,6 +96,8 @@ Global
{EA43BB7C-BC00-4701-BDF4-367880C2495C} = {C7290227-E925-47E7-8B6B-A8B171645D58}
{30024B6F-7BC1-4574-BE5A-924FBD6EAF83} = {FEB257A0-4FD3-495E-9A47-9E1649755445}
{8BE1EECC-3ACF-41B2-AF7D-1A67196FF6C7} = {C7290227-E925-47E7-8B6B-A8B171645D58}
{6C0D8BE1-4A23-4C2F-88B1-D2FBEA0B1903} = {FEB257A0-4FD3-495E-9A47-9E1649755445}
{714F0BD2-3A92-4D1A-8FAC-D0C0599BE3E3} = {C7290227-E925-47E7-8B6B-A8B171645D58}
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {4795CF16-0FDB-4BE0-9768-5CF31564DC03}
Expand Down
6 changes: 3 additions & 3 deletions csharp/src/Client/SchemaConverter.cs
Original file line number Diff line number Diff line change
Expand Up @@ -72,13 +72,13 @@ public static DataTable ConvertArrowSchema(Schema schema, AdbcStatement adbcStat
{
if (f.Metadata.TryGetValue("precision", out string precisionValue))
{
if(!string.IsNullOrEmpty(precisionValue))
if (!string.IsNullOrEmpty(precisionValue))
row[SchemaTableColumn.NumericPrecision] = Convert.ToInt32(precisionValue);
}

if(f.Metadata.TryGetValue("scale", out string scaleValue))
if (f.Metadata.TryGetValue("scale", out string scaleValue))
{
if(!string.IsNullOrEmpty(scaleValue))
if (!string.IsNullOrEmpty(scaleValue))
row[SchemaTableColumn.NumericScale] = Convert.ToInt32(scaleValue);
}
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
<Project Sdk="Microsoft.NET.Sdk">

<PropertyGroup>
<TargetFrameworks>net472;net6.0</TargetFrameworks>
</PropertyGroup>

<ItemGroup>
<PackageReference Include="ApacheThrift" Version="0.19.0" />
</ItemGroup>

<ItemGroup>
<ProjectReference Include="..\..\Apache.Arrow.Adbc\Apache.Arrow.Adbc.csproj" />
</ItemGroup>

</Project>
170 changes: 170 additions & 0 deletions csharp/src/Drivers/Apache/Hive2/HiveServer2Connection.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

using System;
using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;
using Apache.Arrow.Ipc;
using Apache.Hive.Service.Rpc.Thrift;
using Thrift.Protocol;
using Thrift.Transport;

namespace Apache.Arrow.Adbc.Drivers.Apache.Hive2
{
public abstract class HiveServer2Connection : AdbcConnection
{
const string userAgent = "AdbcExperimental/0.0";

protected TOperationHandle operationHandle;
protected IReadOnlyDictionary<string, string> properties;
internal TTransport transport;
internal TCLIService.Client client;
internal TSessionHandle sessionHandle;

internal HiveServer2Connection() : this(null)
{

}

internal HiveServer2Connection(IReadOnlyDictionary<string, string> properties)
{
this.properties = properties;
}

public void Open()
{
TProtocol protocol = CreateProtocol();
this.transport = protocol.Transport;
this.client = new TCLIService.Client(protocol);

var s0 = this.client.OpenSession(CreateSessionRequest()).Result;
this.sessionHandle = s0.SessionHandle;
}

protected abstract TProtocol CreateProtocol();
protected abstract TOpenSessionReq CreateSessionRequest();

public override IArrowArrayStream GetObjects(GetObjectsDepth depth, string catalogPattern, string dbSchemaPattern, string tableNamePattern, List<string> tableTypes, string columnNamePattern)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be quite incomplete. Consider removing it from this PR and submitting separately once it's complete and tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't remove it. HiveServer2 is the base class that Spark and Impala build on, but I will add details to the readme.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the body, or at least the parts of the body that aren't implemented. It could always throw a NotImplementedException for when e.g. depth != GetObjectsDepth.All.

This comment was marked as outdated.

Copy link
Contributor

@birschick-bq birschick-bq Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, removed the implementation here. It was not working and not useful as base functionality.

{
throw new NotImplementedException();
}

public override IArrowArrayStream GetInfo(List<int> codes)
{
throw new NotImplementedException();
}

public override IArrowArrayStream GetTableTypes()
{
throw new NotImplementedException();
}

protected void PollForResponse()
{
TGetOperationStatusResp statusResponse = null;
do
{
if (statusResponse != null) { Thread.Sleep(500); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good reminder for me that we really need a more async-friendly API, ideally a cross-process one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean at the ADBC API definition level? CC @zeroshade who has been thinking about this. It's something I would like to tackle, but there are questions about compatibility and what happens to the sync API afterwards and if we also want to try to 'fix' other things at the same time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put some thoughts into #811 which is the only existing issue we have that sort of tracks async.

Copy link
Contributor

@birschick-bq birschick-bq Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CurtHagenlocher - I have some private WIP to use async as much as possible in this C# implementation. It covers end-to-end async support for ExecuteQueryAsync/ExecuteUpdateAsync. I've put it to side to work on the GetObjects implementation.

TGetOperationStatusReq request = new TGetOperationStatusReq(this.operationHandle);
statusResponse = this.client.GetOperationStatus(request).Result;
} while (statusResponse.OperationState == TOperationState.PENDING_STATE || statusResponse.OperationState == TOperationState.RUNNING_STATE);
}


public override void Dispose()
{
if (this.client != null)
{
TCloseSessionReq r6 = new TCloseSessionReq(this.sessionHandle);
this.client.CloseSession(r6).Wait();

this.transport.Close();
this.client.Dispose();
this.transport = null;
this.client = null;
}
}

protected Schema GetSchema()
{
TGetResultSetMetadataReq request = new TGetResultSetMetadataReq(this.operationHandle);
TGetResultSetMetadataResp response = this.client.GetResultSetMetadata(request).Result;
return SchemaParser.GetArrowSchema(response.Schema);
}

sealed class GetObjectsReader : IArrowArrayStream
{
HiveServer2Connection connection;
Schema schema;
List<TSparkArrowBatch> batches;
int index;
IArrowReader reader;

public GetObjectsReader(HiveServer2Connection connection, Schema schema)
{
this.connection = connection;
this.schema = schema;
}

public Schema Schema { get { return schema; } }

public async ValueTask<RecordBatch> ReadNextRecordBatchAsync(CancellationToken cancellationToken = default)
{
while (true)
{
if (this.reader != null)
{
RecordBatch next = await this.reader.ReadNextRecordBatchAsync(cancellationToken);
if (next != null)
{
return next;
}
this.reader = null;
}

if (this.batches != null && this.index < this.batches.Count)
{
this.reader = new ArrowStreamReader(new ChunkStream(this.schema, this.batches[this.index++].Batch));
continue;
}

this.batches = null;
this.index = 0;

if (this.connection == null)
{
return null;
}

TFetchResultsReq request = new TFetchResultsReq(this.connection.operationHandle, TFetchOrientation.FETCH_NEXT, 50000);
TFetchResultsResp response = await this.connection.client.FetchResults(request, cancellationToken);
this.batches = response.Results.ArrowBatches;

if (!response.HasMoreRows)
{
this.connection = null;
}
}
}

public void Dispose()
{
}
}
}
}
69 changes: 69 additions & 0 deletions csharp/src/Drivers/Apache/Hive2/HiveServer2Exception.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

using System;

namespace Apache.Arrow.Adbc.Drivers.Apache.Hive2
{
public class HiveServer2Exception : AdbcException
{
private string _sqlState;
private int _nativeError;

public HiveServer2Exception()
{
}

public HiveServer2Exception(string message) : base(message)
{
}

public HiveServer2Exception(string message, AdbcStatusCode statusCode) : base(message, statusCode)
{
}

public HiveServer2Exception(string message, Exception innerException) : base(message, innerException)
{
}

public HiveServer2Exception(string message, AdbcStatusCode statusCode, Exception innerException) : base(message, statusCode, innerException)
{
}

public override string SqlState
{
get { return _sqlState; }
}

public override int NativeError
{
get { return _nativeError; }
}

internal HiveServer2Exception SetSqlState(string sqlState)
{
_sqlState = sqlState;
return this;
}

internal HiveServer2Exception SetNativeError(int nativeError)
{
_nativeError = nativeError;
return this;
}
}
}
Loading
Loading