Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust/drivers): adbc driver for datafusion #2267

Merged
merged 13 commits into from
Oct 27, 2024
Merged

Conversation

tokoko
Copy link
Contributor

@tokoko tokoko commented Oct 21, 2024

Fixes #2263

An initial driver implementation for datafusion:

  • bunch of todos everywhere
  • options are not supported
  • get_objects ignores filters
  • get_objects takes depth into account, but only when returning results, not for info retrieval
  • minimal testing

P.S. I'm not much of a rust person, so I'd appreciate any kind of pointers

@github-actions github-actions bot added this to the ADBC Libraries 15 milestone Oct 21, 2024
@tokoko tokoko changed the title feat(rust/driver/datafusion): adbc driver for datafusion feat(rust/drivers/datafusion): adbc driver for datafusion Oct 21, 2024
@tokoko tokoko changed the title feat(rust/drivers/datafusion): adbc driver for datafusion feat(rust/drivers): adbc driver for datafusion Oct 21, 2024
Copy link
Contributor

@mbrobbel mbrobbel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! I'll split my review over multiple chunks.

The failing test is known to be flaky.

rust/Cargo.toml Outdated
@@ -34,6 +34,7 @@ categories = ["database"]

[workspace.dependencies]
adbc_core = { path = "./core" }
arrow = { version = "53.1.0", default-features = false }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding arrow we can add the arrow subcrates to reduce the size of our dependency tree.
It looks like we need to add:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. makes sense, didn't realize most of these functions could be imported from multiple crates. Added both as a dev-dependency.

rust/drivers/datafusion/README.md Outdated Show resolved Hide resolved
rust/drivers/datafusion/src/lib.rs Outdated Show resolved Hide resolved
rust/drivers/datafusion/tests/test_datafusion.rs Outdated Show resolved Hide resolved
@mbrobbel
Copy link
Contributor

To fix the failing CI job we need to install protoc (required by the substrait crate):

You could add

- name: Install Protoc
  uses: arduino/setup-protoc@v3
  with:
    repo-token: ${{ secrets.GITHUB_TOKEN }}

as a step after

- name: Use stable Rust
id: rust
run: |
rustup toolchain install stable --no-self-update
rustup default stable
.

@tokoko
Copy link
Contributor Author

tokoko commented Oct 23, 2024

@mbrobbel thanks, got everything green. I had to temporarily add #![allow(refining_impl_trait)] to lib.rs, couldn't get unimplemented methods that return Result<impl RecordBatchReader + Send> to pass clippy checks otherwise.

Copy link
Contributor

@mbrobbel mbrobbel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good as an initial commit for this driver. The todo!()s and unwraps can be handled in follow-up PRs.

rust/Cargo.toml Outdated Show resolved Hide resolved
rust/drivers/datafusion/Cargo.toml Outdated Show resolved Hide resolved
- name: Install Protoc
uses: arduino/setup-protoc@v3
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...Why does installing protoc need the repo token?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, supposedly to query Github for the latest version...

Is there some other source for protoc that doesn't rely on a third party action?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We can just do curl+unzip but it will require some logic to choose the right binary in order to work cross-platform.
  • Another option is to install grpcio-tools from pypi and then alias protoc to python -m grpc_tools.protoc, but that obviously requires python to be installed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose system protoc would be too old? (And Conda too heavy?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pixi should be fine, I think. I can add pixi.toml in rust directory, we can even install rust from there instead of using rustup iirc. Not sure if just protoc justifies pixi adoption, but it would be the simplest alternative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) substrait crate depends on protoc
(2) I guess, but if we still need to write some logic to install the system protoc (check whether to use apt or brew dependending on the os), we might as well download the binaries instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can use the action minus the token? (Even if it's from Arduino, I'm a bit hesitant about passing even a read-only token to an external action.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, that works as well.. I will drop the token then, if we see that the step becomes flaky as a result, we can always make the changes afterwards.

Copy link
Contributor Author

@tokoko tokoko Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is a no-go 😆. judging from the action code, even if I pin an exact version, the action still tries to fetch all versions, presumably to validate.. seems like there's no way to make it work w/o a token.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lidavidm Finally got it working with pre-compiled binaries

Copy link
Contributor

@eitsupi eitsupi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about these?

.github/workflows/rust.yml Outdated Show resolved Hide resolved
.github/workflows/rust.yml Outdated Show resolved Hide resolved
.github/workflows/rust.yml Outdated Show resolved Hide resolved
.github/workflows/rust.yml Outdated Show resolved Hide resolved
@lidavidm
Copy link
Member

Let's merge this as a start. Thanks @tokoko @eitsupi!

@lidavidm lidavidm merged commit e8af6a8 into apache:main Oct 27, 2024
29 checks passed
@tokoko tokoko deleted the datafusion branch October 27, 2024 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ADBC driver for DataFusion
4 participants