Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(csharp/src/Apache.Arrow.Adbc): correct StandardSchemas.ColumnSchema data types #1731

Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions .pre-commit-config.yaml
birschick-bq marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@

repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
rev: v4.6.0
hooks:
- id: check-xml
- id: check-yaml
Expand All @@ -40,7 +40,7 @@ repos:
- id: trailing-whitespace
exclude: "^r/.*?/_snaps/.*?.md$"
- repo: https://github.com/pre-commit/mirrors-clang-format
rev: "v14.0.6"
rev: "v18.1.3"
hooks:
- id: clang-format
types_or: [c, c++]
Expand All @@ -59,30 +59,30 @@ repos:
- "--linelength=90"
- "--verbose=2"
- repo: https://github.com/golangci/golangci-lint
rev: v1.57.1
rev: v1.57.2
hooks:
- id: golangci-lint
entry: bash -c 'cd go/adbc && golangci-lint run --fix --timeout 5m'
types_or: [go]
- repo: https://github.com/macisamuele/language-formatters-pre-commit-hooks
rev: v2.12.0
rev: v2.13.0
hooks:
- id: pretty-format-golang
- id: pretty-format-java
args: [--autofix]
types_or: [java]
- repo: https://github.com/psf/black
rev: 22.3.0
rev: 24.4.0
hooks:
- id: black
types_or: [pyi, python]
- repo: https://github.com/PyCQA/flake8
rev: 4.0.1
rev: 7.0.0
hooks:
- id: flake8
types_or: [python]
- repo: https://github.com/PyCQA/isort
rev: 5.12.0
rev: 5.13.2
hooks:
- id: isort
types_or: [python]
Expand Down
156 changes: 156 additions & 0 deletions csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

using System;
using System.Collections.Generic;
using Apache.Arrow.Types;

namespace Apache.Arrow.Adbc.Extensions
{
public static class ListArrayExtensions
{
/// <summary>
/// Creates a <see cref="ListArray"/> from a list of <see cref="IArrowArray"/> data for the given datatype <see cref="IArrowArray"/>.
/// </summary>
/// <param name="arrayList">The list of data.</param>
/// <param name="dataType">The data type of the contained data.</param>
/// <returns>A <see cref="ListArray"/> of the data.</returns>
public static ListArray CreateNestedListArray(this IReadOnlyList<IArrowArray?> arrayList, IArrowType dataType)

Check warning on line 32 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# ubuntu-latest

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Check warning on line 32 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# ubuntu-latest

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Check warning on line 32 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# windows-2019

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Check warning on line 32 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# windows-2019

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Check warning on line 32 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# macos-latest

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Check warning on line 32 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# macos-latest

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The summary seems underspecified for a public function. I can't tell from the text what it does or what I need to pass to it to make it work.

Is this functionality which should be in the core Arrow library?

Copy link
Contributor Author

@birschick-bq birschick-bq Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I believe this should likely be included in the Arrow library. Not sure of it proper placement in the namespace, yet.

{
ArrowBuffer.Builder<int> valueOffsetsBufferBuilder = new ArrowBuffer.Builder<int>();
ArrowBuffer.BitmapBuilder validityBufferBuilder = new ArrowBuffer.BitmapBuilder();
List<ArrayData> arrayDataList = new List<ArrayData>(arrayList.Count);
int length = 0;
int nullCount = 0;

foreach (IArrowArray? array in arrayList)

Check warning on line 40 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# ubuntu-latest

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Check warning on line 40 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# ubuntu-latest

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Check warning on line 40 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# windows-2019

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Check warning on line 40 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# windows-2019

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Check warning on line 40 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# macos-latest

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Check warning on line 40 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# macos-latest

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.
{
if (array == null)
{
valueOffsetsBufferBuilder.Append(length);
validityBufferBuilder.Append(false);
nullCount++;
}
else
{
valueOffsetsBufferBuilder.Append(length);
validityBufferBuilder.Append(true);
arrayDataList.Add(array.Data);
length += array.Length;
}
}

ArrowBuffer validityBuffer = nullCount > 0
? validityBufferBuilder.Build() : ArrowBuffer.Empty;

ArrayData? data = ArrayDataConcatenator.Concatenate(arrayDataList);

Check warning on line 60 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# ubuntu-latest

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Check warning on line 60 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# windows-2019

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Check warning on line 60 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# macos-latest

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

if (data == null)
{
EmptyArrayCreationVisitor visitor = new EmptyArrayCreationVisitor();
dataType.Accept(visitor);
data = visitor.Result;
}

IArrowArray value = ArrowArrayFactory.BuildArray(data);

valueOffsetsBufferBuilder.Append(length);

return new ListArray(new ListType(dataType), arrayList.Count,
valueOffsetsBufferBuilder.Build(), value,
validityBuffer, nullCount, 0);
}

private class EmptyArrayCreationVisitor :
IArrowTypeVisitor<BooleanType>,
IArrowTypeVisitor<FixedWidthType>,
IArrowTypeVisitor<BinaryType>,
IArrowTypeVisitor<StringType>,
IArrowTypeVisitor<ListType>,
IArrowTypeVisitor<FixedSizeListType>,
IArrowTypeVisitor<StructType>,
IArrowTypeVisitor<MapType>
{
public ArrayData? Result { get; private set; }

Check warning on line 88 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# ubuntu-latest

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Check warning on line 88 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# ubuntu-latest

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Check warning on line 88 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# windows-2019

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Check warning on line 88 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# windows-2019

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Check warning on line 88 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# macos-latest

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Check warning on line 88 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# macos-latest

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

public void Visit(BooleanType type)
{
Result = new BooleanArray.Builder().Build().Data;
}

public void Visit(FixedWidthType type)
{
Result = new ArrayData(type, 0, 0, 0, new[] { ArrowBuffer.Empty, ArrowBuffer.Empty });
}

public void Visit(BinaryType type)
{
Result = new BinaryArray.Builder().Build().Data;
}

public void Visit(StringType type)
{
Result = new StringArray.Builder().Build().Data;
}

public void Visit(ListType type)
{
type.ValueDataType.Accept(this);
ArrayData? child = Result;

Check warning on line 113 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# ubuntu-latest

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Check warning on line 113 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# windows-2019

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Check warning on line 113 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# macos-latest

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Result = new ArrayData(type, 0, 0, 0, new[] { ArrowBuffer.Empty, MakeInt0Buffer() }, new[] { child });
}

public void Visit(FixedSizeListType type)
{
type.ValueDataType.Accept(this);
ArrayData? child = Result;

Check warning on line 121 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# ubuntu-latest

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Check warning on line 121 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# windows-2019

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Check warning on line 121 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# macos-latest

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Result = new ArrayData(type, 0, 0, 0, new[] { ArrowBuffer.Empty }, new[] { child });
}

public void Visit(StructType type)
{
ArrayData?[] children = new ArrayData[type.Fields.Count];

Check warning on line 128 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# ubuntu-latest

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Check warning on line 128 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# windows-2019

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.

Check warning on line 128 in csharp/src/Apache.Arrow.Adbc/Extensions/ListArrayExtensions.cs

View workflow job for this annotation

GitHub Actions / C# macos-latest

The annotation for nullable reference types should only be used in code within a '#nullable' annotations context.
for (int i = 0; i < type.Fields.Count; i++)
{
type.Fields[i].DataType.Accept(this);
children[i] = Result;
}

Result = new ArrayData(type, 0, 0, 0, new[] { ArrowBuffer.Empty }, children);
}

public void Visit(MapType type)
{
Result = new MapArray.Builder(type).Build().Data;
}

public void Visit(IArrowType type)
{
throw new NotImplementedException($"EmptyArrayCreationVisitor for {type.Name} is not supported yet.");
}

private static ArrowBuffer MakeInt0Buffer()
{
ArrowBuffer.Builder<int> builder = new ArrowBuffer.Builder<int>();
builder.Append(0);
return builder.Build();
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

using System;
using System.Collections.Generic;
using System.Linq;
using Apache.Arrow.Types;

namespace Apache.Arrow.Adbc.Extensions
{
public static class StandardSchemaExtensions
{

/// <summary>
/// Validates a data array that its column number and types match a given schema.
/// </summary>
/// <param name="schema">The schema to validate against.</param>
/// <param name="data">The data array to validate.</param>
/// <exception cref="ArgumentException">Throws an exception if the number of columns or type data types in the data array do not match the schema fields.</exception>
public static IReadOnlyList<IArrowArray> Validate(this Schema schema, IReadOnlyList<IArrowArray> data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does a very superficial comparison of types. For instance, it doesn't check the precision and scale of decimals, length of a fixed-size list, the units of dates, times or intervals, whether or not a union is dense or sparse, the name or nullability of a field, etc. That seems mostly okay for an internal method being used privately against a relatively-known schema, but problematic for a public function that's being exported for anyone to use.

How many of the gaps in the above list are features (for this particular use case) rather than flaws? Should this be standard functionality in the core Arrow library? It seems to overlap with what the (internal) class ArrayDataTypeComparer does.

Copy link
Contributor Author

@birschick-bq birschick-bq Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy to use the existing class, if it were exposed. I could also make this better and more rigorous, if you believe it might be used outside of its current context.

{
Validate(schema.FieldsList, data);
return data;
}

/// <summary>
/// Validates a data array that its column number and types match given schema fields.
/// </summary>
/// <param name="schemaFields">The schema fields to validate against.</param>
/// <param name="data">The data array to validate.</param>
/// <exception cref="ArgumentException">Throws an exception if the number of columns or type data types in the data array do not match the schema fields.</exception>
public static IReadOnlyList<IArrowArray> Validate(this IReadOnlyList<Field> schemaFields, IReadOnlyList<IArrowArray> data)
{
if (schemaFields.Count != data.Count)
{
throw new ArgumentException($"Expected number of columns {schemaFields.Count} not equal to actual length {data.Count}", nameof(data));
}
for (int i = 0; i < schemaFields.Count; i++)
{
Field field = schemaFields[i];
ArrayData dataField = data[i].Data;
if (field.DataType.TypeId != dataField.DataType.TypeId)
{
throw new ArgumentException($"Expecting data type {field.DataType} but found {data[i].Data.DataType} on field with name {field.Name}.", nameof(data));
}
if (field.DataType.TypeId == ArrowTypeId.Struct)
{
StructType structType = (StructType)field.DataType;
Validate(structType.Fields, dataField.Children.Select(e => new ContainerArray(e)).ToList());
}
else if (field.DataType.TypeId == ArrowTypeId.List)
{
ListType listType = (ListType)field.DataType;
if (listType.Fields.Count > 0)
{
Validate(listType.Fields, dataField.Children.Select(e => new ContainerArray(e)).ToList());
}
}
}

return data;
}

private class ContainerArray : Array
{
public ContainerArray(ArrayData data) : base(data)
{
}
}
}
}
5 changes: 2 additions & 3 deletions csharp/src/Apache.Arrow.Adbc/StandardSchemas.cs
Original file line number Diff line number Diff line change
Expand Up @@ -128,8 +128,8 @@ public static class StandardSchemas
new Field("xdbc_scope_catalog", StringType.Default, true),
new Field("xdbc_scope_schema", StringType.Default, true),
new Field("xdbc_scope_table", StringType.Default, true),
new Field("xdbc_is_autoincrement", StringType.Default, true),
new Field("xdbc_is_generatedcolumn", StringType.Default, true)
new Field("xdbc_is_autoincrement", BooleanType.Default, true),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected data types.

new Field("xdbc_is_generatedcolumn", BooleanType.Default, true)
};

public static readonly List<Field> TableSchema = new List<Field>() {
Expand Down Expand Up @@ -178,5 +178,4 @@ public static class StandardSchemas
metadata: null
);
}

}
Loading
Loading