Skip to content

Latest commit

 

History

History
78 lines (56 loc) · 7.44 KB

03_python-packages.md

File metadata and controls

78 lines (56 loc) · 7.44 KB

Python packages created for PDG

We have created five repos that contain python code for preparing data for displaying in Cesium and for archiving.

viz-staging (pdgstaging)

GitHub link

This package was designed to prepare vector data for future converstion steps. Mainly, it "slices" large vector files into smaller files that each have a bounding box corresponding to a tile in a given TMS. It also adds properties, re-projects the data, does deduplication flagging, removes duplicates if specified to do so in the config, deals with filepaths, and all the configuration options.

viz-raster (pdgraster)

GitHub link

This package deals with the conversion of vector data into rasters - both GeoTIFFs for archiving and PNGs for displaying in Cesium. Though it has generic classes and methods in it that could be used flexibly, at this point it depends on pdgstaging and assumes that input has already gone through the staging step.

viz-workflow (pdg_workflow)

GitHub link

The viz-workflow repo deals with running methods from the above packages in parallel. It has a couple of branches under active development:

  • main - currently uses ray for parallelization on the Delta server hosted by the National Center for Supercomputing Applications
  • kubernetes, docker, and parsl workflow - converting the workflow to use Kubernetes, Docker, and parsl to take advantage of the UCSB high performance computing clusters and be interoperable across different platforms such as the Google Cloud Platform

viz-points (pdgpoints)

viz-points takes LiDAR point cloud data (as LAS, LAZ, or other correctly georeferenced LiDAR file formats) and outputs 3dtiles point clouds (hierarchically organized .pnts data files and some .json files that define the hierarchy) that allow web visualization of point clouds. Though we plan on integrating in the future, it is not currently integrated with the rest of the viz-* toolset. viz-points relies on Oslandia py3dtiles and RapidLASso lastools.

py3dtiles

GitHub link

The original py3dtiles python library was original created by an organization named Oslandia and is under active development and maintenance on Gitlab. It was designed to convert point data into Cesium 3D tiles. We have forked and extended it so that it can be used to convert polygons to Cesium 3D tiles.

We should try to keep our version of the package up-to-date with the gitlab/Oslandia version, so that we can include their fixes & enhancements. I watch follow changes on the Gitlab repo and pull them in every two weeks or as needed. Instructions on how to pull in changes from Gitlab to GitHub are documented in the readme of the repo.

Eventually it would be awesome to do a merge request and give them the change to incorporate the changes we made back into their library.

viz-3dtiles (viz_3dtiles)

GitHub link

The viz-3d tiles package is essentially a wrapper around the py3dtiles library. It adds classes & functions for building the heirachy of Cesium 3D tileset JSON files, and for reading in shapefiles

Since it was created, some of these classes were developed into the original py3dtiles library. We might want to eventually make use of those new classes instead. See PermafrostDiscoveryGateway/py3dtiles issues#6.

Releases and EZID

Each time a viz repo's main branche is updated, we make a release with a version. See here for an example page of all the releases for one repo. The new release version should always be updated in the pyproject.toml file. Contributors (authors) should not be removed, but contributors should be appended.

We also have to update EZID so that the release can be cited. After making the release, here are the steps to update EZID:

  • Go to the EZID website and search for the most recent completed release of the package’s DOI, the one prior to the release you are working on creating an XML for.
  • Click the link to open the XML and copy it
  • In VScode, paste the copied XML into a new xml file (name it anything, such as release_new.XML), and correct formatting if needed (you can use an online pretty XML formatter).
  • Make all necessary changes to the XML:
    • DOI:
      • <identifier identifierType="DOI">10.18739/A2Z60C395</identifier>
    • Version number for the release itself and the title:
      • <version>0.9.2</version>
      • <title>Viz-staging: vector data tiling for geospatial visualization (version 0.9.2)</title>
    • Date and year of release:
      • <date dateType="Created">2024-06-18</date>
      • <publicationYear>2024</publicationYear>
    • Software heritage ID
      • To retrieve this, first navigate to the software heritage website for the repo (for viz-workflow it's here) and click “save again” (button on the right side of the page) then retrieve the software heritage ID by copying the “Tip revision” string (bolded numbers and letters like f39a3b7b53823e41ebae1d28136a95cdde5df716)
    • The DOI “new version of” is the most recent older DOI
      • <relatedIdentifier relatedIdentifierType="DOI" relationType="IsNewVersionOf">10.18739/A2RV0D26C</relatedIdentifier>
  • Use the UPDATE command, the last line in these instructions with some changes:
    • replace the DOI
    • replace name of the XML doc to whatever you named it
    • update the string of letters and numbers with the software heritage link
    • replace ${EZIDPASS} with the password (retrieve this from Matt)

Link to this EZID page when referencing the package release in documentation.

Other python packages

  • Our packages rely heavily on some external packages that it would be good to become familiar with:
    • GeoPandas (and pandas) - for reading, manipulating, and writing vector data
    • Rasterio - for reading, manipulating, and writing raster data
    • ray - for parallelization in the Delta server High Performance Computing environment
    • parsl - for parallelization in the UCSB server High Performance Computing Environment Google Kubernetes Engine
      • rio-tiler - not used in the workflow yet, but we may want to incorporate it when our workflow is extended to allow raster data as input (it has functionality to deal with overlapping rasters, partial reading of raster data, categorical color palettes, and a lot more)