Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Data driven tests POC #967

Closed
wants to merge 9 commits into from
Closed

Conversation

WillAyd
Copy link
Contributor

@WillAyd WillAyd commented Aug 5, 2023

ref #787

Needs some more work but wanted to make sure this was headed in the right direction.

@WillAyd
Copy link
Contributor Author

WillAyd commented Aug 5, 2023

I generated the IPC files like this:

import pyarrow as pa
from datetime import date
schema = pa.schema([pa.field("col", pa.date32())])

with pa.OSFile("test.arrow", "wb") as sink:
    with pa.ipc.new_file(sink, schema) as writer:
        batch = pa.record_batch([pa.array([0, 20000], type=pa.date32())], schema)
        writer.write(batch)

Another thought I had (maybe here, maybe in another PR) was that we could choose a high level language like Python for creating the expected values and just have the test suite execute that on the fly, rather than having developers commit binary files to the repo

@WillAyd
Copy link
Contributor Author

WillAyd commented Aug 5, 2023

Current test failure stems from the continuation byte check in ArrowIpcDecoderCheckHeader . I am pretty new to the IPC stuff so am not yet sure if that is an issue with how I generated the files, my expectations, or maybe in nanoarrow

@lidavidm
Copy link
Member

lidavidm commented Aug 7, 2023

Thanks for taking this on! I'll go through it when I get a chance.

Another thought I had (maybe here, maybe in another PR) was that we could choose a high level language like Python for creating the expected values and just have the test suite execute that on the fly, rather than having developers commit binary files to the repo

I think this is reasonable. Not everyone will like Python, but not everyone will like managing a submodule or having binary files in the repo either. (I think all my colleagues are out, or else I'd take a straw poll here...)

@WillAyd
Copy link
Contributor Author

WillAyd commented Aug 7, 2023

From some more debugging today, the file I generated from python begins with the hex bytes 4152 524f 5731 0000 (ARROW1) but the nanoarrow ipc methods are expecting it to begin with a 0xFFFFFFFF continuation byte. Maybe a mismatch in version expectations?

@lidavidm
Copy link
Member

lidavidm commented Aug 7, 2023

Ah, that's stream vs file format; you'll want RecordBatchStreamWriter I think


struct ArrowArray ipc_array;
EXPECT_EQ(stream.get_next(&stream, &ipc_array), NANOARROW_OK);
// TODO: we need to compare array values - maybe from <arrow/compare.h>?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is another outstanding question - do we want to require Arrow for the test suite? nanoarrow does this already

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we can just grab it from Conda, but those who aren't using Conda already complain about how many dependencies we need, especially during verification...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea that makes sense. I guess for now could implement directly in the validation module to avoid the dependency

@WillAyd WillAyd closed this Apr 2, 2024
@WillAyd WillAyd deleted the data-driven-tests branch June 28, 2024 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants