-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Data driven tests POC #967
Conversation
I generated the IPC files like this: import pyarrow as pa
from datetime import date
schema = pa.schema([pa.field("col", pa.date32())])
with pa.OSFile("test.arrow", "wb") as sink:
with pa.ipc.new_file(sink, schema) as writer:
batch = pa.record_batch([pa.array([0, 20000], type=pa.date32())], schema)
writer.write(batch) Another thought I had (maybe here, maybe in another PR) was that we could choose a high level language like Python for creating the expected values and just have the test suite execute that on the fly, rather than having developers commit binary files to the repo |
Current test failure stems from the continuation byte check in ArrowIpcDecoderCheckHeader . I am pretty new to the IPC stuff so am not yet sure if that is an issue with how I generated the files, my expectations, or maybe in nanoarrow |
Thanks for taking this on! I'll go through it when I get a chance.
I think this is reasonable. Not everyone will like Python, but not everyone will like managing a submodule or having binary files in the repo either. (I think all my colleagues are out, or else I'd take a straw poll here...) |
From some more debugging today, the file I generated from python begins with the hex bytes |
Ah, that's stream vs file format; you'll want RecordBatchStreamWriter I think |
|
||
struct ArrowArray ipc_array; | ||
EXPECT_EQ(stream.get_next(&stream, &ipc_array), NANOARROW_OK); | ||
// TODO: we need to compare array values - maybe from <arrow/compare.h>? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is another outstanding question - do we want to require Arrow for the test suite? nanoarrow does this already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we can just grab it from Conda, but those who aren't using Conda already complain about how many dependencies we need, especially during verification...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea that makes sense. I guess for now could implement directly in the validation module to avoid the dependency
ref #787
Needs some more work but wanted to make sure this was headed in the right direction.