-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python: Use meson-python instead of setuptools #644
Conversation
python/src/nanoarrow/meson.build
Outdated
sources: [cyf], | ||
cython_args: cython_args, | ||
include_directories: nanoarrow_src_dir, | ||
dependencies: [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little too broad to use every dependency for every Cython file; can refine so that only IPC files need the IPC dep, Device files need the device dep, etc...
7245021
to
7c0e44a
Compare
@@ -103,7 +105,7 @@ if get_option('ipc') | |||
) | |||
nanoarrow_ipc_dep = declare_dependency(include_directories: [incdir], | |||
link_with: nanoarrow_ipc_lib, | |||
dependencies: [nanoarrow_dep, flatcc_dep]) | |||
dependencies: [nanoarrow_dep]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was a mistake to include the flatcc_dep
transitively here; I think its only required to build the nanoarrow_ipc
lib itself, but shouldn't be required by libraries that want to use nanoarrow_ipc
(?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unclear on the difference but yes, the flatcc headers are deliberately not included in nanoarrow's public headers such that a library using nanoarrow does not need flatcc/*
files in the include path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea this removal here essentially removes the flatcc headers from being included by downstream targets that use the nanoarrow_ipc_dep; this is akin to changing the flatcc target inclusion from PUBLIC to PRIVATE in CMake
I know you're still working here, but just making sure you know about https://github.com/apache/arrow-nanoarrow/blob/main/ci/scripts/test-python-wheels.sh Will this work with CUDA? I imagine it's possible to add I do think we have to support building from sdist since I believe that's how the conda-forge setup currently works. (In general I think meson is way better than setuptools provided it has the same level of Python version and platform support...thank you for working on this!) |
# Replaced by version-bumping scripts at release time | ||
version = "0.6.0.dev0" | ||
[wrap-file] | ||
directory = arrow-nanoarrow-b78c0395a6fe74e776daf07933583f06567fb99c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just pointing to a single commit here that includes some of the changes made to the top level meson.build
configuration. In a follow up ideally point to main or even a wrapdb release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maintaining the commit-level integration is the real trick of this particular Python package. It is probably difficult, but solving that is essential if this going to be the approach we use (i.e., if we are going to update the Python build system, we can't do it in such a way that it removes features or increases complexity).
Does the approach you use in the examples work?
arrow-nanoarrow/examples/meson-minimal/subprojects/nanoarrow.wrap
Lines 18 to 24 in 4168119
[wrap-git] | |
# url should point to the project root with the top-level meson.build file | |
url = ../../.. | |
revision = head | |
[provide] | |
nanoarrow = nanoarrow_dep |
...and maybe if building a source distribution you could copy arrow-nanoarrow/src
into python/vendor/arrow-nanoarrow/arc
and update the subprojects file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit level integration is managed through the symlink to the subprojects folder. A wrap file doesn't work like that - instead it just copies the library at a point in time to subprojects, and you would have to manually run meson wrap update
constantly to get that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To try and bring that all together...I wanted to have a subprojects folder that looked like:
python/
subprojects/
arrow-nanoarrow/ -- this is a symlink to the project root
arrow-nanoarrow.wrap
Ideally, if meson finds the arrow-nanoarrow
subproject it should not need the wrap file at all, and just happily use the symlink for a live integration. When it comes to sdists where the symlink will not exist, the wrap file should be responsible for resolving and unpacking a particular arrow-nanoarrow
distribution to the subprojects folder (probably from the wrap db, but in this PR I have it pointing to this branch to get the config updates)
Due to the "bug" I linked elsewhere in this PR, Meson confusingly still downloads the contents of arrow-nanoarrow.wrap
even when the arrow-nanoarrow
symlink exists, so to work around that the local directory structure instead looks like:
python/
subprojects/
arrow-nanoarrow-dev/ -- this is a symlink to the project root
arrow-nanoarrow.wrap
And the configuration looks like:
nanoarrow_proj = subproject('arrow-nanoarrow-dev', required: false)
if not nanoarrow_proj.found()
nanoarrow_proj = subproject('arrow-nanoarrow')
endif
When developing locally, the arrow-nanoarrow-dev symlink will always be prioritized. Of course, when we create source distributions we end up removing symlinks, so the wrap file comes into play in those instances.
Hope that makes sense but let me know if anything is unclear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I get it, but I would like the sdist to have a full copy of the nanoarrow C library rather than download something. I can experiment a bit to see if I can get that to work (as well as fire up my Windows and CUDA test machines locally to ensure those both work too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK. In that case we can probably get rid of the wrap file then; just need to figure out the right way/hook to install a copy into subprojects when generating the sdist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in bootstrap.py
? ADBC sets an environment variable when building a source distribution and we probably could too:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meson-python and meson both actually have a dist option to include subprojects, which I think is where our best bet is
https://meson-python.readthedocs.io/en/stable/how-to-guides/meson-args.html
Of course, it doesn't seem to work right now (I think the symlink behavior tricks it into failure) but let me research that a bit more. Would rather let the build system take care of this if it can
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meson.add_dist_script
may be useful here? It will run when building an sdist, and you can invoke a Python script to copy exactly the files/dirs you want under subprojects/
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK I think that's a great idea. The --include-subprojects
flag I was looking at seems to just copy the symlink, but not deeply copy the subproject. So a custom script is probably the safest bet
0943ede
to
c0d3bc5
Compare
Definitely not an area of expertise for me, but I don't think Meson provides that many abstractions over CUDA at the moment
Yep should still be a part of this - when cloning the git repo if the arrow-nanoarrow subproject is there, meson should reference that. For source distributions, they should include the |
c0d3bc5
to
30914ec
Compare
.github/workflows/python-wheels.yaml
Outdated
if: ${{ matrix.config.label == "windows" }} | ||
with: | ||
python-version: "3.12" | ||
architecture: x64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like setup-python still installs a 32 bit Python by default on the windows runner, while cibuildwheel is expected 64 bit on that host
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still get 32-bit wheels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so...I think cibuildwheel controls that, and since we don't override anything in the pyproject.toml I think it should still generate the same types of wheels as specified here:
https://cibuildwheel.pypa.io/en/stable/options/#build-skip
Admittedly my knowledge of Windows and how it builds for different architectures is limited, so definitely something to verify when wheels actually get build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might have to update the python-wheels.yaml
workflow to fire on python/meson.build
instead of python/setup.py
to get it to run on this PR (in which case we can inspect the results and find out!)
Putting back in draft mode for now as there might be an upstream bug in meson we need to resolve mesonbuild/meson#13746 ...or my understanding is wrong. Either way, I think the way things stand now are a very close approximation of what this solution could look like |
Another outlet of this (or something in addition to this) could be something in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few thoughts! If we can solve the commit-level integration issue I think we should merge just after the release 🙂
# Replaced by version-bumping scripts at release time | ||
version = "0.6.0.dev0" | ||
[wrap-file] | ||
directory = arrow-nanoarrow-b78c0395a6fe74e776daf07933583f06567fb99c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maintaining the commit-level integration is the real trick of this particular Python package. It is probably difficult, but solving that is essential if this going to be the approach we use (i.e., if we are going to update the Python build system, we can't do it in such a way that it removes features or increases complexity).
Does the approach you use in the examples work?
arrow-nanoarrow/examples/meson-minimal/subprojects/nanoarrow.wrap
Lines 18 to 24 in 4168119
[wrap-git] | |
# url should point to the project root with the top-level meson.build file | |
url = ../../.. | |
revision = head | |
[provide] | |
nanoarrow = nanoarrow_dep |
...and maybe if building a source distribution you could copy arrow-nanoarrow/src
into python/vendor/arrow-nanoarrow/arc
and update the subprojects file?
.github/workflows/python-wheels.yaml
Outdated
if: ${{ matrix.config.label == "windows" }} | ||
with: | ||
python-version: "3.12" | ||
architecture: x64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still get 32-bit wheels?
I don't really have a stake in this, but just to mention it: for pyarrow we are considering to switch from setuptools to scikit-build-core, and for maintenance overhead there might be some value in using the same build backend for the various python packages in the project (but in the end, we should also use the best tool in each case, and here there is less existing cmake compared to pyarrow, so using meson-python might be the simpler option) |
Given the dependency-freeness, probably either would work, and probably it would not be too hard to transition to one or the other if we need to for some reason! (We definitely need something better than setuptools, though, since the day is coming soon when we need zstd and lz4 to support IPC compression). |
I spent some time working with this to see if there was any easy way out of the "meson python doesn't make it easy to build a Python package in a subdirectory" problem, and I wonder if a slightly lower barrier to entry would be to keep the existing |
Yea I think that or what @rgommers suggested (run a script during Meson's dist hook to vendor what we need) are the two options. I'll take another look at this next week |
9bae7f3
to
ae750f0
Compare
python/generate_dist.py
Outdated
target_src_dir = subproj_dir / "src" | ||
shutil.copytree(src_dir / "src", target_src_dir) | ||
|
||
# this files are only needed by bootstrap.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The amount of scripting here is unfortunate...would still be nice to replace bootstrap.py and/or bundle.py, but not sure I want to tackle that in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this PR I think you will have the most success just renaming bootstrap.py
to generate_dist.py
and including a meson lib definition for nanoarrow in python/meson.build
(i.e., don't try to build nanoarrow C using a subproject).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "including a meson lib definition?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running bootstrap.py
will get you python/vendor/nanoarrow.c
and friends, and you should be able to put the appropriate nanoarrow_c_lib = library(...)
in python/meson.build
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK - so you want to run bootstrap.py continuously, not just when the distribution is being created?
The current form uses the subproject symlink for live integration with the C project and only vendors when the distribution is created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can get all the tests passing and all the wheels building with that go for it! I think you will have more success tackling those two battles separately but the time is yours 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I understand your point of view looking at this some more. The bootstrapping mechanism is pretty tightly engrained as is, and would be an effort in and of itself to replace.
Still looking at the right balance here - might take me a bit
2b35a72
to
254202c
Compare
python/src/nanoarrow/meson.build
Outdated
py_sources = [ | ||
'__init__.py', | ||
'array.py', | ||
'array_stream.py', | ||
'c_array.py', | ||
'c_array_stream.py', | ||
'c_buffer.py', | ||
'c_schema.py', | ||
'device.py', | ||
'ipc.py', | ||
'iterator.py', | ||
'_repr_utils.py', | ||
'schema.py', | ||
'visitor.py', | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to make this a glob like "*.py"
so that a new contributor doesn't have to remember to update this list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. Here's some documentation in Meson as to why:
https://mesonbuild.com/FAQ.html#why-cant-i-specify-target-files-with-a-wildcard
python/generate_dist.py
Outdated
target_src_dir = subproj_dir / "src" | ||
shutil.copytree(src_dir / "src", target_src_dir) | ||
|
||
# this files are only needed by bootstrap.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this PR I think you will have the most success just renaming bootstrap.py
to generate_dist.py
and including a meson lib definition for nanoarrow in python/meson.build
(i.e., don't try to build nanoarrow C using a subproject).
python/src/nanoarrow/__init__.py
Outdated
@@ -75,7 +77,8 @@ | |||
from nanoarrow.array import array, Array | |||
from nanoarrow.array_stream import ArrayStream | |||
from nanoarrow.visitor import nulls_as_sentinel, nulls_forbid, nulls_separate | |||
from nanoarrow._version import __version__ # noqa: F401 | |||
|
|||
__version__ = importlib.metadata.version("nanoarrow") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche is there any issues with versioning in this way? Is there any cost to importing importlib.metadata
by default?
(I am only skeptical because I haven't seen a Python package do this before)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I accidentally removed the comment, but the downside to this approach versus a dynamic version generator like miniver is that the git hash is not included as part of the project metadata.
In this case the project definition from pyproject.toml just imports whatever the build-backend provides, which is currently 0.7.0-SNAPSHOT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'll have to be 0.7.0.dev[0-9]+
to work with the nightly wheels!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as whether or not this approach is viable, here is the Python packaging documentation:
https://packaging.python.org/en/latest/discussions/single-source-version/#single-source-version
The date on that is pretty recent and I believe importlib.metadata was provisional up until 3.10, so a lot of established packages probably had to come up with their own solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any issues with versioning in this way? Is there any cost to importing
importlib.metadata
by default?
I would personally advise to not use importlib.metadata.version
for creating the __version__
attribute, because it will add some avoidable overhead (without testing I don't know how big this would be, but in general we want to keep the import as lean as possible, and this seems unnecessary overhead to scan installed packages, when the __version__
attribute can (in the built packages, i.e. injected during build-time) simply be a static string).
In addition, when using editable installs, this might also not always work as expected (the metadata version does not update automatically).
As far as whether or not this approach is viable, here is the Python packaging documentation:
https://packaging.python.org/en/latest/discussions/single-source-version/#single-source-version
Note that that page does not actually mention to use importlib.metadata.version
to populate __version__
, just that there is a desire that those return the same thing.
There has been a long discussion on this the last months at https://discuss.python.org/t/please-make-package-version-go-away/58501, that resulted in the updated page linked above (it's long, don't read it all ;), but I do see that it does actually have some discussion on the overhead, estimated there to be 20-40ms (although it might also depend on how big your environment is))
python/src/nanoarrow/meson.build
Outdated
# Try to resolve the symlink to a subproject first and fall back | ||
# to the wrap entry if that is unsuccessful. Ideally Meson would | ||
# take care of this for us, but there is a bug upstream | ||
# https://github.com/mesonbuild/meson/issues/13746#issuecomment-2392510954 | ||
nanoarrow_proj = subproject('arrow-nanoarrow') | ||
nanoarrow_dep = nanoarrow_proj.get_variable('nanoarrow_dep') | ||
nanoarrow_ipc_dep = nanoarrow_proj.get_variable('nanoarrow_ipc_dep') | ||
nanoarrow_device_dep = nanoarrow_proj.get_variable('nanoarrow_device_dep') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I think you want to vendor the files using bootstrap.py
or whatever facility meson provides for that and not attempt to get the nanoarrow-as-a-subproject bit working (orthogonal battle to using meson-python vs. setuptools).
@@ -38,7 +38,11 @@ Changelog = "https://github.com/apache/arrow-nanoarrow/blob/main/CHANGELOG.md" | |||
|
|||
[build-system] | |||
requires = [ | |||
"setuptools >= 61.0.0", | |||
"meson-python", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need a version constraint here?
@@ -103,7 +105,7 @@ if get_option('ipc') | |||
) | |||
nanoarrow_ipc_dep = declare_dependency(include_directories: [incdir], | |||
link_with: nanoarrow_ipc_lib, | |||
dependencies: [nanoarrow_dep, flatcc_dep]) | |||
dependencies: [nanoarrow_dep]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unclear on the difference but yes, the flatcc headers are deliberately not included in nanoarrow's public headers such that a library using nanoarrow does not need flatcc/*
files in the include path.
e7c117b
to
c88e104
Compare
c031772
to
b90d55c
Compare
b90d55c
to
75298de
Compare
@rgommers maybe you have an idea on the Windows failure? Tried looking through upstream discussions but its a rather broad topic: https://github.com/apache/arrow-nanoarrow/actions/runs/11559159131/job/32173258300?pr=644#step:6:184
AFAIU the container which cibuildwheel installs a 32 bit Python but the compiler being used is 64 bit? |
Yes indeed. The tl;dr is that if you want to build wheels for 32-bit Python on 64-bit Windows, Meson doesn't automatically configure MSVC in 32-bit mode for you (rationale: there could be non-Python library or executable targets, and why would those be 32-bit because somewhere else in To fix it, explicitly configure 32-bit MSVC in your |
61d4deb
to
b59fce4
Compare
b59fce4
to
1c50f70
Compare
16f8652
to
0fcf0c3
Compare
ffd415b
to
b13064e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Truly heroic effort getting all of this to pass!
The only issue I see is that flatcc sources aren't vendored/not contained in the sdist. I think that is OK for now (solvable if somebody asks for it) and probably only directly affects somebody trying to do Python development on an airplane 🙂 .
The latest commit should add that functionality now, relying on Meson's own functionality to bundle subprojects as part of dist creation. It's a bit wonky that flatcc is bundled that way and the rest of the files are created by our custom script reading from a CMake file...but I'm not sure that can be resolved in this PR. Definitely would like to tackle that in follow up(s) |
4ccf15e
to
d3a7490
Compare
Thanks @paleolimbot for the review and @rgommers for all of the guidance! |
This PR originally started as a nanobind POC to see if we could convert from Cython over to that, but I scaled that back just to focus on the possiblity of porting the build system to meson (you can see some nanobind components in the history, if you cared to look)
The build system port may have some merits on its own. It is more tightly integrated with the build system of the C code, and has the advantage of being built in parallel.
Technically it links to the main project by symlinking it as a subproject in thesubprojects
directory; if we cared to support source Python builds we could replace that symlink with a wrap file on distribution, or could even detect within the configuration if the symlink resolves. In the case it does not, we can fallback to a github releaseIf the user wanted to develop locally, they could simply symlink the project root into the subprojects folder of Python (the git repo already does this). For source distributions, the wrap system can provide a fallback project to download and reference
If we choose to try and move off of Cython incrementally and use something like nanobind or even C extensions, I think Meson can handle those more capably than the setup.py script