test: Data driven tests POC #967

WillAyd · 2023-08-05T17:10:35Z

Needs some more work but wanted to make sure this was headed in the right direction.

WillAyd · 2023-08-05T17:10:40Z

I generated the IPC files like this:

import pyarrow as pa
from datetime import date
schema = pa.schema([pa.field("col", pa.date32())])

with pa.OSFile("test.arrow", "wb") as sink:
    with pa.ipc.new_file(sink, schema) as writer:
        batch = pa.record_batch([pa.array([0, 20000], type=pa.date32())], schema)
        writer.write(batch)

Another thought I had (maybe here, maybe in another PR) was that we could choose a high level language like Python for creating the expected values and just have the test suite execute that on the fly, rather than having developers commit binary files to the repo

WillAyd · 2023-08-05T17:12:25Z

Current test failure stems from the continuation byte check in ArrowIpcDecoderCheckHeader . I am pretty new to the IPC stuff so am not yet sure if that is an issue with how I generated the files, my expectations, or maybe in nanoarrow

lidavidm · 2023-08-07T12:43:53Z

Thanks for taking this on! I'll go through it when I get a chance.

Another thought I had (maybe here, maybe in another PR) was that we could choose a high level language like Python for creating the expected values and just have the test suite execute that on the fly, rather than having developers commit binary files to the repo

I think this is reasonable. Not everyone will like Python, but not everyone will like managing a submodule or having binary files in the repo either. (I think all my colleagues are out, or else I'd take a straw poll here...)

WillAyd · 2023-08-07T14:01:13Z

From some more debugging today, the file I generated from python begins with the hex bytes 4152 524f 5731 0000 (ARROW1) but the nanoarrow ipc methods are expecting it to begin with a 0xFFFFFFFF continuation byte. Maybe a mismatch in version expectations?

lidavidm · 2023-08-07T14:08:09Z

Ah, that's stream vs file format; you'll want RecordBatchStreamWriter I think

WillAyd · 2023-08-08T09:37:56Z

c/validation/adbc_validation.cc

+
+    struct ArrowArray ipc_array;
+    EXPECT_EQ(stream.get_next(&stream, &ipc_array), NANOARROW_OK);
+    // TODO: we need to compare array values - maybe from <arrow/compare.h>?


I think this is another outstanding question - do we want to require Arrow for the test suite? nanoarrow does this already

I suppose we can just grab it from Conda, but those who aren't using Conda already complain about how many dependencies we need, especially during verification...

Yea that makes sense. I guess for now could implement directly in the validation module to avoid the dependency

WillAyd added 2 commits August 5, 2023 11:22

Initial file compilation with flatcc vendor

d22be05

get header verification error

fe6707e

WillAyd added 4 commits August 7, 2023 12:38

Working using IpcStreamReader

85f4e9b

Closer to functional

50fbd02

Implemented schema comparison; touch ups

d569670

Comment

faa2f8f

WillAyd commented Aug 8, 2023

View reviewed changes

WillAyd added 3 commits August 8, 2023 05:43

Updated test expectation files

fb96e53

Fix memory issue

e204900

Fix driver_manager linker failure

bf21806

WillAyd closed this Apr 2, 2024

WillAyd deleted the data-driven-tests branch June 28, 2024 14:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: Data driven tests POC #967

test: Data driven tests POC #967

WillAyd commented Aug 5, 2023 •

edited

Loading

WillAyd commented Aug 5, 2023

WillAyd commented Aug 5, 2023

lidavidm commented Aug 7, 2023

WillAyd commented Aug 7, 2023

lidavidm commented Aug 7, 2023

WillAyd Aug 8, 2023

lidavidm Aug 8, 2023

WillAyd Aug 8, 2023

test: Data driven tests POC #967

test: Data driven tests POC #967

Conversation

WillAyd commented Aug 5, 2023 • edited Loading

WillAyd commented Aug 5, 2023

WillAyd commented Aug 5, 2023

lidavidm commented Aug 7, 2023

WillAyd commented Aug 7, 2023

lidavidm commented Aug 7, 2023

WillAyd Aug 8, 2023

Choose a reason for hiding this comment

lidavidm Aug 8, 2023

Choose a reason for hiding this comment

WillAyd Aug 8, 2023

Choose a reason for hiding this comment

WillAyd commented Aug 5, 2023 •

edited

Loading