-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(csharp/src/Apache.Arrow.Adbc): correct StandardSchemas.ColumnSchema data types #1731
fix(csharp/src/Apache.Arrow.Adbc): correct StandardSchemas.ColumnSchema data types #1731
Conversation
@@ -128,8 +128,8 @@ public static class StandardSchemas | |||
new Field("xdbc_scope_catalog", StringType.Default, true), | |||
new Field("xdbc_scope_schema", StringType.Default, true), | |||
new Field("xdbc_scope_table", StringType.Default, true), | |||
new Field("xdbc_is_autoincrement", StringType.Default, true), | |||
new Field("xdbc_is_generatedcolumn", StringType.Default, true) | |||
new Field("xdbc_is_autoincrement", BooleanType.Default, true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected data types.
Oh hmm, the regex I'm using to validate the title is too strict - periods should probably be allowed. You can ignore that check for now, please use the correct path. |
…/inconsistent-columnschema-types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the whole, I'm not excited about exposing two public and non-ADBC-specific functions from this assembly -- particularly without a stronger specification for them. Can we make them internal and then use InternalsVisibleTo? This is already being used in the C# Arrow implementation, so the build processes should support it.
It would also be nice to eliminate the large number of nullability warnings by either removing the nullability annotations or by putting the code into a #nullable
block.
/// <param name="schema">The schema to validate against.</param> | ||
/// <param name="data">The data array to validate.</param> | ||
/// <exception cref="ArgumentException">Throws an exception if the number of columns or type data types in the data array do not match the schema fields.</exception> | ||
public static IReadOnlyList<IArrowArray> Validate(this Schema schema, IReadOnlyList<IArrowArray> data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does a very superficial comparison of types. For instance, it doesn't check the precision and scale of decimals, length of a fixed-size list, the units of dates, times or intervals, whether or not a union is dense or sparse, the name or nullability of a field, etc. That seems mostly okay for an internal method being used privately against a relatively-known schema, but problematic for a public function that's being exported for anyone to use.
How many of the gaps in the above list are features (for this particular use case) rather than flaws? Should this be standard functionality in the core Arrow library? It seems to overlap with what the (internal) class ArrayDataTypeComparer does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy to use the existing class, if it were exposed. I could also make this better and more rigorous, if you believe it might be used outside of its current context.
/// <param name="arrayList">The list of data.</param> | ||
/// <param name="dataType">The data type of the contained data.</param> | ||
/// <returns>A <see cref="ListArray"/> of the data.</returns> | ||
public static ListArray CreateNestedListArray(this IReadOnlyList<IArrowArray?> arrayList, IArrowType dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The summary seems underspecified for a public function. I can't tell from the text what it does or what I need to pass to it to make it work.
Is this functionality which should be in the core Arrow library?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I believe this should likely be included in the Arrow library. Not sure of it proper placement in the namespace, yet.
…ecific external projects.
…/inconsistent-columnschema-types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the classed in the new extensions to be internal
and exposed them in the AssemblyInfo.cs
properties.
|
||
namespace Apache.Arrow.Adbc.Extensions | ||
{ | ||
internal static class ListArrayExtensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This extensions is marked as internal
.
|
||
namespace Apache.Arrow.Adbc.Extensions | ||
{ | ||
internal static class StandardSchemaExtensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This extension is marked as internal.
|
||
using System.Runtime.CompilerServices; | ||
|
||
[assembly: InternalsVisibleTo("Apache.Arrow.Adbc.Drivers.Apache, PublicKey=0024000004800000940000000602000000240000525341310004000001000100e504183f6d470d6b67b6d19212be3e1f598f70c246a120194bc38130101d0c1853e4a0f2232cb12e37a7a90e707aabd38511dac4f25fcb0d691b2aa265900bf42de7f70468fc997551a40e1e0679b605aa2088a4a69e07c117e988f5b1738c570ee66997fba02485e7856a49eca5fd0706d09899b8312577cbb9034599fc92d4")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expose internal interfaces to three libraries, assuming the current signing key.
/// <param name="schema">The schema to validate against.</param> | ||
/// <param name="data">The data array to validate.</param> | ||
/// <exception cref="ArgumentException">Throws an exception if the number of columns or type data types in the data array do not match the schema fields.</exception> | ||
public static IReadOnlyList<IArrowArray> Validate(this Schema schema, IReadOnlyList<IArrowArray> data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy to use the existing class, if it were exposed. I could also make this better and more rigorous, if you believe it might be used outside of its current context.
/// <param name="arrayList">The list of data.</param> | ||
/// <param name="dataType">The data type of the contained data.</param> | ||
/// <returns>A <see cref="ListArray"/> of the data.</returns> | ||
public static ListArray CreateNestedListArray(this IReadOnlyList<IArrowArray?> arrayList, IArrowType dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I believe this should likely be included in the Arrow library. Not sure of it proper placement in the namespace, yet.
RE: nullability annotations I believe we need a separate ticket for this. It will require hundreds of small changes to mark current non-nullable types as nullable. I see you have already created a new issue for this. |
…ma data types (apache#1731) 1. Corrects the two fields in `StandardSchemas.ColumnSchema` 2. Adds unit tests for the `StandardSchemas` 3. Adds extension methods to validate data using `Schema` or `IReadOnlyList<Field>`. 4. Add extension method to `CreateNestedListArray` Resolves apache#1729
StandardSchemas.ColumnSchema
StandardSchemas
Schema
orIReadOnlyList<Field>
.CreateNestedListArray
Resolves #1729