Support array types #191

plcplc · 2023-11-28T14:39:34Z

What

We now recognize array types in several places:

Database introspection columns and comparison operators of array type
Columns of array type can be queried
Arguments of array type can be input
Arrays work in tables and native queries alike

Notable limitations, due in part to the modest expressiveness of the ndc-spec in its current state:

There is no way to filter array fields
There is no way to project fields from the elements of an array field
We only support monomorphic comparison operators

How

The introspection query has had its types sub-query split into scalar_types and array_types subqueries. We benefit from the fact that postgres only supports one-dimensional array types (i.e., array dimensionality is a dynamic aspect of an array value), so we don't have to recurse to construct the json-representation of array types.

Most of the rust-language changes of this PR follow mechanically from changing metadata::ScalarType to the new metadata::Type, which also models array types.

However, since our field types now include arrays we need to observe that certain parts (e.g. ComparisonOperators) only accept base scalar types (e.g. in version1.rs, filtering.rs).

In defining the new type for column types we also take care to maintain backwards compatibility, which means deserialization gets a bit more verbose than just the automatically derived implementations we used to get by with (see metadata/database.rs).

Since we also support array-typed arguments we have to be able to output SQL array constructors (see values.rs, sql/convert.rs).

We introduce two new tests:

One test which just verifies that we can output an array column from a native query
Another which verifies that we can take an array-valued argument to a native query.

Update

Altering the representation of column types in the configuration constitues a breaking change. Even if we are still able to ingest the historic configuration formats these are not described by the published json schema, and we would only ever output in the newest format.

In order to address the issue of versioning the original contents of this PR is being split in two:

The first part (this PR) contains the logic for dealing with array types, and includes a new level of indirection between the api transport types and those used internally.
A second follow-up PR will introduce a version 2 of the configuration format, update introspection to include array types, and add tests of the same.

An interesting consequence of introducing the distinction between api transport types and internal business types is that the various tables.json files of the query-engine translation tests contain json-serialized data of business types.

Worth highlighting is this means we cannot simply copy a subsection of a deployment configuration into a tables.json file like we used to, as the business types do not necessarily correspond to any api-version.

While this may seem confusing it also has some benefits:

Test files have no version drift. They are always representative of the current internal business data model
The test remains coupled with the query-engine crate rather than the connector crate.

Updating test files can become a chore too. There is no tool that can upgrade them automatically like we do for deployments, and it would probably not be worth the effort to try and make one.

What I ended up doing in the particular case of this PR was a small shell script:

for f in $(fd -t f 'tables\.json')
do
  jq 'walk(if type == "object" and has("type") then .type |= {scalarType: .} else . end)' < "$f" \
  | sponge "$f"
done

The jq script matches any object that has a type key and wraps its value with a {"scalarType": ..} object. Happily this covered the full breadth of my change.

sponge is a tool I just discovered which buffers all the stdin it receives and writes everything into the file it's given. Essentially this endows our command with the ability to make in-place edits with shell pipe filters.

crates/query-engine/metadata/src/metadata/database.rs

crates/query-engine/translation/src/translation/query/values.rs

crates/connectors/ndc-postgres/src/configuration/version1.rs

danieljharvey · 2023-12-01T15:00:44Z

crates/connectors/ndc-postgres/src/configuration.rs

        RuntimeConfiguration {
-            metadata: &self.config.metadata,
+            metadata: metadata_to_current(&self.config.metadata),


We do this mapping on every request don't we? I wonder if in future we do it whenever RawConfiguration becomes Configuration (one of Connector trait functions run by the config server IIRC), and keep RuntimeConfiguration inside Configuration.

Not in this PR, mind. Probably worth benchmarking it too to check a speculative speed up becomes a real one.

I'm not sure this makes sense.

IIUC, what the user will consider as being "their configuration metadata" is the json-serialized Configuration, not just the RawConfiguration, meaning that the whole thing is what needs to be able to exist in different versions.

It's still a bit nebulous to me though, so I could be wrong.

All I mean is, currently we take the whole Configuration and derive RuntimeConfiguration from it on each request, surely we could just do this once?

It's not really relevant to this PR, mind, just something I noticed because you made a change near the code that does this.

crates/connectors/ndc-postgres/src/configuration/version1.rs

danieljharvey

This all seems sensible. Left a couple of questions, but they could probably both be answered in a couple of follow-up tidy ups or optimisation passes at some later point.

### What This PR adds a new version (`"2"`) of the deployment configuration data format. This version of the configuration is capable of expressing array types in collections and arguments. Since this is the first time a new version is introduced there are a lot of changes the only purpose of which is to distinguish between versions. Only the infrastructure-related shell of the connector is aware of different versions of deployment configurations existing. The core of the connector only works with a single internal version. This PR is also the one to introduce tests of array types. In hindsight this ought to have been possible in the previous PR that introduced the internal types and transformations (#191). Note that there is not yet any automated way to upgrade a configuration to a newer version, but this will be introduced shortly. This PR also adds a changelog entry. ### How The file `version2.rs` is a duplicate of `version1.rs`, which has been adapted to use the new data types (incidentally these are just the ones of the internal model). `configuration.sql` now exists as `version1.sql` and `version2.sql` respectively, since these have different capabilities. This is because `version2.sql` introduces the ability to introspect array types. `configuration.rs` now exposes `RawConfiguration` and `Configuration` types which are enums of all the supported versions (currently 1 and "2"). One big wart on the implementation is that serde and schemars are unable to derive trait implementations for these types correctly, since they only support strings as enum tags, and we used a number literal for version 1. Once we drop support of version 1 completely we can remove the manually implemented instances. The various `Connector` trait implementations now explicitly work on the internal representation of a configuration, `RuntimeConfiguration`. --------- Co-authored-by: Daniel Harvey <[email protected]>

plcplc added 8 commits November 27, 2023 09:39

first take on arrays

fe030cb

tests for other postgres variants

c6d1c3e

leftover files

fbaf255

Support for arrays as input

8ff2372

tests for other variants

39dec81

simplify values.rs

5e004bd

commentary

cb4e2ef

Merge remote-tracking branch 'origin/main' into plc/issues/NDAT-950

3acc07d

plcplc requested review from soupi, i-am-tom and danieljharvey November 28, 2023 14:40

check

bcf1f4c

plcplc enabled auto-merge November 28, 2023 15:40

soupi reviewed Nov 29, 2023

View reviewed changes

crates/query-engine/metadata/src/metadata/database.rs Outdated Show resolved Hide resolved

plcplc disabled auto-merge November 29, 2023 11:32

soupi reviewed Nov 29, 2023

View reviewed changes

crates/query-engine/translation/src/translation/query/values.rs Outdated Show resolved Hide resolved

soupi reviewed Nov 29, 2023

View reviewed changes

crates/connectors/ndc-postgres/src/configuration/version1.rs Outdated Show resolved Hide resolved

Docstrings, and making functions private

72e01c7

plcplc force-pushed the plc/issues/NDAT-950 branch from 5c8cfb9 to 1fa5019 Compare November 30, 2023 20:42

plcplc added 2 commits November 30, 2023 21:43

Revert exposed changes in preparation for controlled version bump

7f5d41b

Separate api configuration types from internal configuration types

e258b81

plcplc force-pushed the plc/issues/NDAT-950 branch from 1fa5019 to e258b81 Compare November 30, 2023 20:43

plcplc added 2 commits November 30, 2023 21:44

Current deployment was missing a snapshot

8f49779

Merge remote-tracking branch 'origin/main' into plc/issues/NDAT-950

5194597

plcplc enabled auto-merge December 1, 2023 12:09

danieljharvey reviewed Dec 1, 2023

View reviewed changes

crates/connectors/ndc-postgres/src/configuration/version1.rs Show resolved Hide resolved

danieljharvey approved these changes Dec 1, 2023

View reviewed changes

plcplc added this pull request to the merge queue Dec 1, 2023

Merged via the queue into main with commit addc37f Dec 1, 2023
26 checks passed

plcplc deleted the plc/issues/NDAT-950 branch December 1, 2023 15:22

plcplc mentioned this pull request Dec 11, 2023

Introduce configuration version 2 #208

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support array types #191

Support array types #191

plcplc commented Nov 28, 2023 •

edited

Loading

danieljharvey Dec 1, 2023

plcplc Dec 1, 2023

danieljharvey Dec 1, 2023

danieljharvey left a comment

Support array types #191

Support array types #191

Conversation

plcplc commented Nov 28, 2023 • edited Loading

What

How

Update

danieljharvey Dec 1, 2023

Choose a reason for hiding this comment

plcplc Dec 1, 2023

Choose a reason for hiding this comment

danieljharvey Dec 1, 2023

Choose a reason for hiding this comment

danieljharvey left a comment

Choose a reason for hiding this comment

plcplc commented Nov 28, 2023 •

edited

Loading