Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data files license #433

Open
avalentino opened this issue Jun 26, 2024 · 13 comments
Open

Data files license #433

avalentino opened this issue Jun 26, 2024 · 13 comments

Comments

@avalentino
Copy link
Contributor

I'm in the process of packaging pyogrio for Debian, I hope you are fine with it.
TO meet the Debian packaging standards I need to report the license for all files included in the package.
I would appreciate a lot if you could clarify what is the license of data files included in pyogrio/tests/fixtures, and in particular the license of:

  • poly_not_enough_points.shp.zip
  • sample.osm.pbf
  • test_fgdb.gdb.zip
  • test_mixed_surface.gpkg

The pyogrio/tests/fixtures/README.md seems to clarify what is the origin of some of the data files but the license for me is not clear.
Can I safely assume that data files are provided with the same license of the source code (MIT)?

@martinfleis
Copy link
Member

Can I safely assume that data files are provided with the same license of the source code (MIT)?

I don't think so. At least the OSM sample retains ODbL. I am not sure about the rest.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jun 26, 2024

  • poly_not_enough_points.shp.zip

This was recently added in #422. @theroggy did you create this file manually? (would be good to add a note about that in the README then as well)

  • test_fgdb.gdb.zip

This is downloaded from https://trac.osgeo.org/gdal/raw-attachment/wiki/FileGDB/. @rouault do you know if this wiki falls under the general GDAL license?

  • test_mixed_surface.gpkg

This is extracted from one of the datasets from https://www.usgs.gov/national-hydrography/access-national-hydrography-products. I don't directly find anything on that page about the license of those datasets (maybe the USGS has a general license it uses for all available datasets? but not familiar with it)
EDIT: https://www.usgs.gov/faqs/what-are-terms-uselicensing-map-services-and-data-national-map says "public domain"

@rouault
Copy link
Contributor

rouault commented Jun 26, 2024

  • test_fgdb.gdb.zip

Maybe @jmckenna remembers the provenance of this file ? Otherwise you could potentially switch to one of the GDAL autotest suite samples: https://github.com/OSGeo/gdal/tree/master/autotest/ogr/data/filegdb

@theroggy
Copy link
Member

  • poly_not_enough_points.shp.zip

This was recently added in #422. @theroggy did you create this file manually? (would be good to add a note about that in the README then as well)

No, I exported the polygon giving the issue from the file provided in this issue: geopandas/geopandas#3336

So, not sure about licensing :-(... @EwoutH can you shed some light?

@EwoutH
Copy link
Contributor

EwoutH commented Jun 27, 2024

Should have mentioned that, that file doesn't have a proper open-source license so I don't think it can be in there.

I got it on a project license, see https://mrdh.nl/verkeersmodel.

I think sharing it for debugging was already stretching it now I think of it (but probably ok).

@avalentino
Copy link
Contributor Author

Thanks a lot to everybody for the help.
To summarize the discussion please find below an excerpt of the debian/copyright file that I'm preparing:

Files: *
Copyright: 2020-2021, Brendan C. Ward and pyogrio contributors
License: Expat

Files: pyogrio/arrow_bridge.h
Copyright: 2020-2021, Brendan C. Ward and pyogrio contributors
License: Apache-2.0

Files: pyogrio/tests/fixtures/naturalearth_lowres/*
       pyogrio/tests/fixtures/test_mixed_surface.gpkg
Copyright: discalimed
License: public-domain

Files: pyogrio/tests/fixtures/sample.osm.pbf
Copyright: OpenStreetMap contributors
License: OBdL-1.0

For the time being:

  • I will remove poly_not_enough_points.shp.zip and skip the associated test_read_invalid_shp test. At least until teh situation is clarified
  • for test_fgdb.gdb.zip, apparently, the situation is still not totally clear, I can remove it as well for the moment. This implies skipping at least 8 additional tests

For the other files in pyogrio/tests/fixtures (I mean the one not mentioned in the above debian/copyrigtht file excerpt) it is assume the same license of the source code.

Please feel free to comment if there is anything that looks incorrect.

@brendan-ward
Copy link
Member

@avalentino the license for pyogrio is MIT, not Expat. Also, we're out of date, but please make the copyright extend through 2024 (will submit a PR to fix here shortly).

I believe arrow_bridge.h is derived from the Arrow project and should preserve their license / copyright statement rather than ours. Unfortunately, I'm not easily finding a copyright statement for that specific file, and the top-level LICENSE.txt for Arrow includes many copyright statements for code derived from different sources. (not sure what to do in this case re: copyright)

If you can give us a few more days, I can try to create some alternative test files that sidestep licensing issues.

@avalentino
Copy link
Contributor Author

@avalentino the license for pyogrio is MIT, not Expat. Also, we're out of date, but please make the copyright extend through 2024 (will submit a PR to fix here shortly).

According to the Debian documentation (e.g. 1 and 2) MIT and Expat should be equivalent in most of the cases and the recommendation (for a metter of homogeneity within Debian) is to use the Expat name when the text of the license matches the Expat one. Of course it is not a big issue to change the name if it matters for you but, in any case, the text of the license is reported in the same debian/copyright file, I have just reported an excerpt for brevity.

[1] https://www.debian.org/legal/licenses/
[2] https://www.debian.org/legal/licenses/mit

I believe arrow_bridge.h is derived from the Arrow project and should preserve their license / copyright statement rather than ours. Unfortunately, I'm not easily finding a copyright statement for that specific file, and the top-level LICENSE.txt for Arrow includes many copyright statements for code derived from different sources. (not sure what to do in this case re: copyright)
If you can give us a few more days, I can try to create some alternative test files that sidestep licensing issues.

Absolutely no rush. Please take your time, and thank you for supporting me.

@brendan-ward
Copy link
Member

@avalentino per #441, I've removed the test files with problematic licenses. Some of our maintainers are out of office right now, so we're not quite ready to merge this in yet. Might be another week or two.

@avalentino
Copy link
Contributor Author

Thanks for the update @brendan-ward

@QuLogic
Copy link
Contributor

QuLogic commented Aug 18, 2024

On a related note, does arrow_bridge.h actually need to be installed? It seems like it probably should only be for extension building purposes, but it ends up in the installed copy too.

@brendan-ward
Copy link
Member

arrow_bridge.h needs to go into the source distribution; do you mean exclude it from the wheels?

@QuLogic
Copy link
Contributor

QuLogic commented Aug 20, 2024

Yes, I do mean the wheels; I opened #463 to remove it and the Cython files, which seem accidental.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants