diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index cf7f521..ed1d889 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -4,8 +4,14 @@ on: push: branches: - main + paths-ignore: + - "paper/**" + - "README.md" pull_request: types: [opened, synchronize, reopened] + paths-ignore: + - "paper/**" + - "README.md" env: EO_TIDES_TIDE_MODELS: ./tests/data/tide_models diff --git a/.github/workflows/paper.yml b/.github/workflows/paper.yml new file mode 100644 index 0000000..d2016fc --- /dev/null +++ b/.github/workflows/paper.yml @@ -0,0 +1,47 @@ +name: Draft paper PDF +on: + push: + branches: + - main + - JOSS_paper + paths: + - paper/** + - .github/workflows/paper.yml + pull_request: + branches: + - main + - JOSS_paper + paths: + - paper/** + - .github/workflows/paper.yml + +jobs: + paper: + runs-on: ubuntu-latest + name: Generate paper draft + steps: + - name: Checkout + uses: actions/checkout@v4 + + - name: Build draft PDF + uses: openjournals/openjournals-draft-action@master + with: + journal: joss + paper-path: paper/paper.md + + - name: Upload + uses: actions/upload-artifact@v4 + with: + name: paper + # Output path where Pandoc will write the compiled PDF. + # Note, this should be the same directory as the input + # paper.md + path: paper/paper.pdf + + - name: Commit updated PDF + uses: stefanzweifel/git-auto-commit-action@v4 + if: github.event_name == 'pull_request' + continue-on-error: true + with: + commit_message: Update generated PDF + file_pattern: "paper/paper.pdf" diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 2c93a10..70fa9ac 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,7 +1,6 @@ # Contributing to `eo-tides` Contributions are welcome, and they are greatly appreciated! -Every little bit helps, and credit will always be given. You can contribute in many ways: @@ -92,17 +91,7 @@ Now, validate that all unit tests are passing: make test ``` -9. Before raising a pull request you can also run tox to - run the tests across different versions of Python: - -```bash -tox -``` - -This requires you to have multiple versions of python installed. -This step is also triggered in the CI/CD pipeline, so you could also choose to skip this step locally. - -10. Commit your changes and push your branch to GitHub: +9. Commit your changes and push your branch to GitHub: ```bash git add . @@ -111,12 +100,3 @@ git push origin name-of-your-bugfix-or-feature ``` 11. Submit a pull request through the GitHub website. - -# Pull Request Guidelines - -Before you submit a pull request, check that it meets these guidelines: - -1. The pull request should ideally include tests. - -2. If the pull request adds functionality, the docs should be updated. - Put your new functionality into a function with a docstring, and add the feature to the list in `README.md`. diff --git a/README.md b/README.md index c561655..fda8549 100644 --- a/README.md +++ b/README.md @@ -12,8 +12,7 @@ - πŸ“˜ **Documentation**: - 🐍 **PyPI**: -> [!CAUTION] -> This package is a work in progress, and not currently ready for operational use. +
`eo-tides` provides powerful parallelized tools for integrating satellite Earth observation data with tide modelling. πŸ› οΈπŸŒŠπŸ›°οΈ diff --git a/docs/assets/eo-tides-abstract.gif b/docs/assets/eo-tides-abstract.gif index a48df48..4c0439e 100644 Binary files a/docs/assets/eo-tides-abstract.gif and b/docs/assets/eo-tides-abstract.gif differ diff --git a/docs/index.md b/docs/index.md index 7c2954d..5af8548 100644 --- a/docs/index.md +++ b/docs/index.md @@ -9,10 +9,6 @@ [![Commit activity](https://img.shields.io/github/commit-activity/m/GeoscienceAustralia/eo-tides)](https://img.shields.io/github/commit-activity/m/GeoscienceAustralia/eo-tides) [![License](https://img.shields.io/github/license/GeoscienceAustralia/eo-tides)](https://img.shields.io/github/license/GeoscienceAustralia/eo-tides) -!!! warning - - Note: This package is a work in progress, and not currently ready for operational use. - `eo-tides` provides provides powerful parallelized tools for integrating satellite Earth observation data with tide modelling. πŸ› οΈπŸŒŠπŸ›°οΈ `eo-tides` combines advanced tide modelling functionality from the [`pyTMD`](https://pytmd.readthedocs.io/en/latest/) package with [`pandas`](https://pandas.pydata.org/docs/index.html), [`xarray`](https://docs.xarray.dev/en/stable/) and [`odc-geo`](https://odc-geo.readthedocs.io/en/latest/), providing a suite of flexible tools for efficient analysis of coastal and ocean Earth observation data – from regional, continental, to global scale. diff --git a/eo_tides/eo.py b/eo_tides/eo.py index 811212c..c42be9b 100644 --- a/eo_tides/eo.py +++ b/eo_tides/eo.py @@ -209,7 +209,7 @@ def tag_tides( added to the `xarray.DataArray` outputs. Defaults to "EOT20"; specify "all" to use all models available in `directory`. For a full list of available and supported models, run - `eo_tides.model.list_models`. + `eo_tides.utils.list_models`. directory : str, optional The directory containing tide model data files. If no path is provided, this will default to the environment variable @@ -340,7 +340,7 @@ def pixel_tides( added to the `xarray.DataArray` outputs. Defaults to "EOT20"; specify "all" to use all models available in `directory`. For a full list of available and supported models, run - `eo_tides.model.list_models`. + `eo_tides.utils.list_models`. directory : str, optional The directory containing tide model data files. If no path is provided, this will default to the environment variable diff --git a/eo_tides/model.py b/eo_tides/model.py index 2930bfe..7e66629 100644 --- a/eo_tides/model.py +++ b/eo_tides/model.py @@ -444,7 +444,7 @@ def model_tides( The tide model (or list of models) to use to model tides. Defaults to "EOT20"; specify "all" to use all models available in `directory`. For a full list of available and supported models, - run `eo_tides.model.list_models`. + run `eo_tides.utils.list_models`. directory : str, optional The directory containing tide model data files. If no path is provided, this will default to the environment variable @@ -735,7 +735,7 @@ def model_phases( The tide model (or list of models) to use to model tides. Defaults to "EOT20"; specify "all" to use all models available in `directory`. For a full list of available and supported models, - run `eo_tides.model.list_models`. + run `eo_tides.utils.list_models`. directory : str, optional The directory containing tide model data files. If no path is provided, this will default to the environment variable diff --git a/eo_tides/stats.py b/eo_tides/stats.py index ed0d000..d1676fd 100644 --- a/eo_tides/stats.py +++ b/eo_tides/stats.py @@ -246,7 +246,7 @@ def tide_stats( returned as a `pandas.Dataframe`; otherwise a `pandas.Series`. Defaults to "EOT20"; specify "all" to use all models available in `directory`. For a full list of available and supported - models, run `eo_tides.model.list_models`. + models, run `eo_tides.utils.list_models`. directory : str, optional The directory containing tide model data files. If no path is provided, this will default to the environment variable @@ -453,7 +453,7 @@ def pixel_stats( added to the `xarray.Dataset` output. Defaults to "EOT20"; specify "all" to use all models available in `directory`. For a full list of available and supported models, run - `eo_tides.model.list_models`. + `eo_tides.utils.list_models`. directory : str, optional The directory containing tide model data files. If no path is provided, this will default to the environment variable diff --git a/paper/benchmarking.ipynb b/paper/benchmarking.ipynb new file mode 100644 index 0000000..ffca2bd --- /dev/null +++ b/paper/benchmarking.ipynb @@ -0,0 +1,267 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "0c52f321-4eaa-4518-8549-887ee6ff1b1a", + "metadata": {}, + "source": [ + "# Benchmark parallelisation performance" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "881716d8-ae18-42c7-a35c-aefde7e7e95d", + "metadata": {}, + "outputs": [], + "source": [ + "# !pip install eo_tides" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "38035d66-8899-4982-ac0c-3d57a3aef035", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import platform\n", + "import psutil\n", + "import pandas as pd\n", + "import numpy as np\n", + "from eo_tides.model import model_tides" + ] + }, + { + "cell_type": "markdown", + "id": "5d5341dd-c891-48fe-b1ee-3866c13659ec", + "metadata": {}, + "source": [ + "## Computer info" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "5adc7785-99d5-4123-b837-9ef9e67cfaac", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "CPU Info:\n", + " Number of Cores: 32\n", + " Physical Cores: 16\n", + "\n", + "Memory Info:\n", + " Total: 267.30 GB\n", + " Available: 262.48 GB\n", + "\n", + "System Info:\n", + " System: Linux\n", + " Release: 5.10.230-223.885.amzn2.x86_64\n", + " Machine: x86_64\n", + " Processor: x86_64\n" + ] + } + ], + "source": [ + "# Get CPU information\n", + "print(\"CPU Info:\")\n", + "print(f\" Number of Cores: {os.cpu_count()}\")\n", + "print(f\" Physical Cores: {psutil.cpu_count(logical=False)}\\n\")\n", + "\n", + "# Get Memory information\n", + "print(\"Memory Info:\")\n", + "virtual_memory = psutil.virtual_memory()\n", + "print(f\" Total: {virtual_memory.total / 1e9:.2f} GB\")\n", + "print(f\" Available: {virtual_memory.available / 1e9:.2f} GB\\n\")\n", + "\n", + "# Get system platform information\n", + "print(\"System Info:\")\n", + "print(f\" System: {platform.system()}\")\n", + "print(f\" Release: {platform.release()}\")\n", + "print(f\" Machine: {platform.machine()}\")\n", + "print(f\" Processor: {platform.processor()}\")" + ] + }, + { + "cell_type": "markdown", + "id": "e0e89162-c23b-493c-b5e2-9890b954f02a", + "metadata": {}, + "source": [ + "## Parameters\n", + "* Hourly tides for one month\n", + "* 10000 point locations\n", + "* Three tide models" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "7b653060-9e63-4b74-9728-0388544571a8", + "metadata": {}, + "outputs": [], + "source": [ + "directory = \"/var/share/tide_models/\"\n", + "n = 10000\n", + "x = np.linspace(122.00, 123.0, n)\n", + "y = np.linspace(-18.00, -19.00, n)\n", + "time = pd.date_range(start=\"2018-01-01\", end=\"2018-01-31\", freq=\"1h\")\n", + "models = [\"FES2022\", \"TPXO10-atlas-v2-nc\", \"GOT5.6\"]" + ] + }, + { + "cell_type": "markdown", + "id": "593bbcbd-6410-42a5-a0be-002314acf537", + "metadata": {}, + "source": [ + "## Run with default parallelisation" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "3a706946-b25b-4268-a432-f9a3d5d77d20", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Modelling tides with FES2022, TPXO10-atlas-v2-nc, GOT5.6 in parallel (models: 3, splits: 10)\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 30/30 [00:53<00:00, 1.79s/it]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Modelling tides with FES2022, TPXO10-atlas-v2-nc, GOT5.6 in parallel (models: 3, splits: 10)\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 30/30 [00:55<00:00, 1.86s/it]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Modelling tides with FES2022, TPXO10-atlas-v2-nc, GOT5.6 in parallel (models: 3, splits: 10)\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 30/30 [00:54<00:00, 1.81s/it]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "55.9 s Β± 560 ms per loop (mean Β± std. dev. of 3 runs, 1 loop each)\n" + ] + } + ], + "source": [ + "%%timeit -n 1 -r 3\n", + "\n", + "tide_df = model_tides(\n", + " x=x,\n", + " y=y,\n", + " time=time,\n", + " model=models,\n", + " directory=directory,\n", + " parallel=True,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "715e9e43-60fb-4cfa-907f-069109910593", + "metadata": {}, + "source": [ + "## Run without parallelisation" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "bb0cbd6a-8ba4-42e6-af0e-654f8933929a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Modelling tides with FES2022\n", + "Modelling tides with TPXO10-atlas-v2-nc\n", + "Modelling tides with GOT5.6\n", + "Modelling tides with FES2022\n", + "Modelling tides with TPXO10-atlas-v2-nc\n", + "Modelling tides with GOT5.6\n", + "Modelling tides with FES2022\n", + "Modelling tides with TPXO10-atlas-v2-nc\n", + "Modelling tides with GOT5.6\n", + "9min 24s Β± 749 ms per loop (mean Β± std. dev. of 3 runs, 1 loop each)\n" + ] + } + ], + "source": [ + "%%timeit -n 1 -r 3\n", + "\n", + "tide_df = model_tides(\n", + " x=x,\n", + " y=y,\n", + " time=time,\n", + " model=models,\n", + " directory=directory,\n", + " parallel=False,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "10cf41b2-9870-4df9-a081-80a163a23afc", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.15" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/paper/figures/joss_abstract.png b/paper/figures/joss_abstract.png new file mode 100644 index 0000000..0cd994d Binary files /dev/null and b/paper/figures/joss_abstract.png differ diff --git a/paper/figures/joss_fig_gesla.png b/paper/figures/joss_fig_gesla.png new file mode 100644 index 0000000..368d675 Binary files /dev/null and b/paper/figures/joss_fig_gesla.png differ diff --git a/paper/figures/joss_fig_list.png b/paper/figures/joss_fig_list.png new file mode 100644 index 0000000..62bcfe5 Binary files /dev/null and b/paper/figures/joss_fig_list.png differ diff --git a/paper/figures/joss_fig_pixel.png b/paper/figures/joss_fig_pixel.png new file mode 100644 index 0000000..f65a017 Binary files /dev/null and b/paper/figures/joss_fig_pixel.png differ diff --git a/paper/figures/joss_fig_stats.png b/paper/figures/joss_fig_stats.png new file mode 100644 index 0000000..d533c5c Binary files /dev/null and b/paper/figures/joss_fig_stats.png differ diff --git a/paper/paper.bib b/paper/paper.bib new file mode 100644 index 0000000..697cec9 --- /dev/null +++ b/paper/paper.bib @@ -0,0 +1,225 @@ +@misc{pytmd, + author = {Sutterley, T.C. and Alley, K. and Brunt, K. and Howard, S., and Padman, L., and Siegried, M.}, + title = {pyTMD: Python-based tidal prediction software}, + year = 2017, + publisher = {Zenodo}, + doi = {10.5281/zenodo.5555395}, + url = {https://doi.org/10.5281/zenodo.5555395}, +} + +@software{reback2020pandas, + author = {{pandas development team}}, + title = {pandas-dev/pandas: Pandas}, + month = feb, + year = 2020, + publisher = {Zenodo}, + version = {latest}, + doi = {10.5281/zenodo.3509134}, + url = {https://doi.org/10.5281/zenodo.3509134} +} + +@InProceedings{ mckinney-proc-scipy-2010, + author = { {W}es {M}c{K}inney }, + title = { {D}ata {S}tructures for {S}tatistical {C}omputing in {P}ython }, + booktitle = { {P}roceedings of the 9th {P}ython in {S}cience {C}onference }, + pages = { 56 - 61 }, + year = { 2010 }, + editor = { {S}t\'efan van der {W}alt and {J}arrod {M}illman }, + doi = { 10.25080/Majora-92bf1922-00a } +} + +@article{Hoyer_xarray_N-D_labeled_2017, + author = {Hoyer, Stephan and Joseph, Hamman}, + doi = {10.5334/jors.148}, + journal = {Journal of Open Research Software}, + month = apr, + number = {1}, + title = {{xarray}: N-D labeled Arrays and Datasets in Python}, + volume = {5}, + year = {2017} +} + +@misc{odcgeo, + author = {{odc-geo contributors}}, + title = {opendatacube/{odc-geo}}, + license = {Apache 2.0}, + publisher = {GitHub}, + journal = {GitHub repository}, + year = 2024, + url = {https://github.com/opendatacube/odc-geo}, +} + +@article{murray2012continental, + title={Continental scale mapping of tidal flats across East Asia using the Landsat archive}, + author={Murray, Nicholas J and Phinn, Stuart R and Clemens, Robert S and Roelfsema, Chris M and Fuller, Richard A}, + journal={Remote Sensing}, + volume={4}, + number={11}, + pages={3417--3426}, + year={2012}, + publisher={Molecular Diversity Preservation International (MDPI)} +} + +@article{sagar2017item, + title={Extracting the intertidal extent and topography of the Australian coastline from a 28 year time series of Landsat observations}, + author={Sagar, S. and Roberts, D. and Bala, B. and Lymburner, L.}, + journal={Remote Sensing of Environment}, + volume={195}, + pages={153--169}, + year={2017}, + publisher={Elsevier} +} + +@inproceedings{carrere2022new, + title={A new barotropic tide model for global ocean: FES2022}, + author={Carrere, Loren and Lyard, Florent and Cancet, Mathilde and Allain, Damien and Dabat, Mei-Ling and Fouchet, Ergane and Sahuc, Etienne and Faugere, Yannice and Dibarboure, Gerald and Picot, Nicolas}, + booktitle={2022 Ocean Surface Topography Science Team Meeting}, + pages={43}, + year={2022} +} + +@article{GESLAv3, + author = {Haigh, Ivan D. and Marcos, Marta and Talke, Stefan A. and Woodworth, Philip L. and Hunter, John R. and Hague, Ben S. and Arns, Arne and Bradshaw, Elizabeth and Thompson, Philip}, + title = {GESLA Version 3: A major update to the global higher-frequency sea-level dataset}, + journal = {Geoscience Data Journal}, + volume = {10}, + number = {3}, + pages = {293-314}, + keywords = {sea level records, sea level rise, storm surges, storm tides, tide gauge}, + doi = {https://doi.org/10.1002/gdj3.174}, + url = {https://rmets.onlinelibrary.wiley.com/doi/abs/10.1002/gdj3.174}, + eprint = {https://rmets.onlinelibrary.wiley.com/doi/pdf/10.1002/gdj3.174}, + abstract = {Abstract This paper describes a major update to the quasi-global, higher-frequency sea-level dataset known as GESLA (Global Extreme Sea Level Analysis). Versions 1 (releasedΒ 2009) and 2 (released 2016) of the dataset have been used in many published studies, across a wide range of oceanographic and coastal engineering-related investigations concerned with evaluating tides, storm surges, extreme sea levels, and other related processes. The third version of the dataset (released 2021), presented here, contains double the number of years of data, and nearly four times the number of records, compared to Version 2. The dataset consists of records obtained from multiple sources around the world. This paper describes the assembly of the dataset, its processing, and its format, and outlines potential future improvements.}, + year = {2023} +} + +@misc{krause2021dea, + title={{Digital Earth Australia} notebooks and tools repository}, + author={Krause, C. and Dunn, B. and Bishop-Taylor, R. and Adams, C. and Burton, C. and Alger, M. and Chua, S. and Phillips, C. and Newey, V. and Kouzoubov, K. and Leith, A. and Ayers, D. and Hicks, A.}, + year={2021}, + publisher={Commonwealth of Australia (Geoscience Australia)}, + url={https://doi.org/10.26186/145234}, + howpublished={\url{https://github.com/GeoscienceAustralia/dea-notebooks/}}, + doi={10.26186/145234} +} + +@misc{deaintertidal, + title={{Digital Earth Australia Intertidal}}, + author={Bishop-Taylor, R. and Phillips, C. and Newey, V. and Sagar, S}, + year={2024}, + publisher={Commonwealth of Australia (Geoscience Australia)}, + url={https://dx.doi.org/10.26186/149403}, + doi={10.26186/149403} +} + +@article{Fitzpatrick2024, + doi = {10.21105/joss.06683}, + url = {https://doi.org/10.21105/joss.06683}, + year = {2024}, + publisher = {The Open Journal}, + volume = {9}, + number = {99}, + pages = {6683}, + author = {Sharon Fitzpatrick and Daniel Buscombe and Jonathan A. Warrick and Mark A. Lundine and Kilian Vos}, + title = {CoastSeg: an accessible and extendable hub for satellite-derived-shoreline (SDS) detection and mapping}, + journal = {Journal of Open Source Software} +} + +@article{eleveld2014estuarine, + title={Estuarine suspended particulate matter concentrations from sun-synchronous satellite remote sensing: Tidal and meteorological effects and biases}, + author={Eleveld, Marieke A and Van der Wal, Daphne and Van Kessel, Thijs}, + journal={Remote Sensing of Environment}, + volume={143}, + pages={204--215}, + year={2014}, + publisher={Elsevier} +} + +@article{bishop2021mapping, + title={Mapping {Australia}'s {Dynamic Coastline} at {Mean Sea Level} Using {Three Decades of Landsat Imagery}}, + author={Bishop-Taylor, R. and Nanson, R. and Sagar, S. and Lymburner, L.}, + journal={Remote Sensing of Environment}, + volume={267}, + pages={112734}, + year={2021}, + publisher={Elsevier}, + doi={10.1016/j.rse.2021.112734}, + url={https://doi.org/10.1016/j.rse.2021.112734} +} + +@article{bishop2019NIDEM, + title={Between the tides: Modelling the elevation of Australia's exposed intertidal zone at continental scale}, + author={Bishop-Taylor, R. and Sagar, S. and Lymburner, L. and Beaman, R.J.}, + journal={Estuarine, Coastal and Shelf Science}, + volume={223}, + pages={115-128}, + year={2019}, + publisher={Elsevier}, + doi={10.1016/j.ecss.2019.03.006}, + url={https://doi.org/10.1016/j.ecss.2019.03.006} +} + +@article{sagar2018composites, + title={Generating Continental Scale Pixel-Based Surface Reflectance Composites in Coastal Regions with the Use of a Multi-Resolution Tidal Model}, + author={Sagar, S. and Phillips, C. and Bala, B. and Roberts, D. and Lymburner, L. and Beaman, R.J.}, + journal={Remote Sensing}, + volume={10(3)}, + pages={480}, + year={2018}, + publisher={MDPI}, + doi={10.3390/rs10030480}, + url={https://doi.org/10.3390/rs10030480} +} + +@article{vitousek2023future, + title={The future of coastal monitoring through satellite remote sensing}, + author={Vitousek, S. and Buscombe, D. and Vos, K. and Barnard, P. L. and Ritchie, A. C. and Warrick, J. A.}, + journal={Cambridge Prisms: Coastal Futures}, + volume={1}, + pages={e10}, + year={2023}, + publisher={Cambridge University Press}, + doi={10.1017/cft.2022.4}, + url={https://doi.org/10.1017/cft.2022.4} +} + +@article{turner2021satellite, + title={Satellite optical imagery in {Coastal Engineering}}, + author={Turner, I. L. and Harley, M. D. and Almar, R. and Bergsma, E. W. J.}, + journal={Coastal Engineering}, + volume={167}, + pages={103919}, + year={2021}, + publisher={Elsevier}, + doi={10.1016/j.coastaleng.2021.103919}, + url={https://doi.org/10.1016/j.coastaleng.2021.103919} +} + +@article{vos2019coastsat, + title={{CoastSat}: {A} {Google Earth Engine}-enabled {Python} toolkit to extract shorelines from publicly available satellite imagery}, + author={Vos, K. and Splinter, K. D. and Harley, M. D. and Simmons, J. A. and Turner, I. L.}, + journal={Environmental Modelling \& Software}, + volume={122}, + pages={104528}, + year={2019}, + doi={10.1016/j.envsoft.2019.104528}, + url={https://doi.org/10.1016/j.envsoft.2019.104528}, + publisher={Elsevier} +} + +@article{sent2025time, + title={What time is the tide? The importance of tides for ocean colour applications to estuaries}, + author={Sent, G. and Antunes, C. and Spyrakos, E. and Jackson, T. and Atwood, E. C. and Brito, A. C}, + journal={Remote Sensing Applications: Society and Environment}, + volume={37}, + pages={101425}, + year={2025}, + publisher={Elsevier} +} + +@misc{stac2024, +author = {{STAC contributors}}, +title = {{SpatioTemporal Asset Catalog (STAC) specification}}, +url = {https://stacspec.org}, +year = {2024} +} diff --git a/paper/paper.md b/paper/paper.md new file mode 100644 index 0000000..274810e --- /dev/null +++ b/paper/paper.md @@ -0,0 +1,137 @@ +--- +title: "eo-tides: Tide modelling tools for large-scale satellite Earth observation analysis" +tags: + - Python + - Earth observation + - Tide modelling + - Remote sensing + - Coastal + - Satellite data +authors: + - name: Robbi Bishop-Taylor + corresponding: true + orcid: 0000-0002-1533-2599 + affiliation: 1 + - name: Claire Phillips + affiliation: 1 + orcid: 0009-0003-9882-9131 + - name: Stephen Sagar + affiliation: 1 + orcid: 0000-0001-9568-9661 + - name: Vanessa Newey + affiliation: 1 + - name: Tyler Sutterley + affiliation: 2 + orcid: 0000-0002-6964-1194 +affiliations: + - name: Geoscience Australia, Australia + index: 1 + ror: 04ge02x20 + - name: University of Washington Applied Physics Laboratory, United States of America + index: 2 + ror: 03d17d270 +date: 14 January 2025 +bibliography: paper.bib +--- + +# Summary + +The `eo-tides` package provides powerful parallelized tools for integrating satellite Earth observation (EO) data with ocean tide modelling. The package provides a flexible Python-based toolkit for modelling and attributing tide heights to a time-series of satellite images based on the spatial extent and acquisition time of each satellite observation (\autoref{fig:abstract}). + +`eo-tides` leverages advanced tide modelling functionality from the `pyTMD` tide prediction software [@pytmd], combining this fundamental tide modelling capability with EO spatial analysis tools from `odc-geo` [@odcgeo]. This allows tides to be modelled in parallel automatically using over 50 supported tide models, and returned in standardised `pandas` [@reback2020pandas; @mckinney-proc-scipy-2010] and `xarray` [@Hoyer_xarray_N-D_labeled_2017] data formats for further analysis. + +Tools from `eo-tides` are designed to be applied directly to petabytes of freely available satellite data loaded from the cloud using Open Data Cube's `odc-stac` or `datacube` packages (e.g. using [Digital Earth Australia](https://knowledge.dea.ga.gov.au/guides/setup/gis/stac/) or [Microsoft Planetary Computer's](https://planetarycomputer.microsoft.com/) SpatioTemporal Asset Catalogues). Additional functionality enables evaluating potential satellite-tide biases, and validating modelled tides using external tide gauge data β€” both important considerations for assessing the reliability and accuracy of coastal EO workflows. In combination, these open source tools support the efficient, scalable and robust analysis of coastal EO data for any time period or location globally. + +![An example of a typical `eo-tides` coastal EO workflow, with tide heights being modelled into every pixel in a spatio-temporal stack of satellite data (for example, from ESA's Sentinel-2 or NASA/USGS Landsat), then combined to derive insights into dynamic coastal environments.\label{fig:abstract}](figures/joss_abstract.png) + +# Statement of need + +Satellite remote sensing offers an unparalleled method to view and examine dynamic coastal environments over large temporal and spatial scales [@turner2021satellite; @vitousek2023future]. However, the variable and sometimes extreme influence of ocean tides in these regions can complicate analyses, making it difficult to separate the influence of changing tides from patterns of true coastal change over time [@vos2019coastsat]. This is a particularly significant challenge for continental- to global-scale coastal EO analyses, where failing to account for complex tide dynamics can lead to inaccurate or misleading insights into coastal processes observed by satellites. + +Conversely, information about ocean tides can also provide unique environmental insights that can greatly enhance the utility of coastal EO data. Conventionally, satellite data dimensions consider the geographical "where" and the temporal "when" of data acquisition. The addition of tide height as a new analysis dimension allows data to be filtered, sorted and analysed with respect to tidal processes, delivering a powerful re-imagining of traditional multi-temporal EO data analysis [@sagar2017item]. For example, satellite data can be analysed to focus on specific ecologically-significant tidal stages (e.g. high, low tide, spring or neap tides) or on particular tidal processes (e.g. ebb or flow tides; @sent2025time). + +This concept has been used to map tidally-corrected annual coastlines from Landsat satellite data at continental scale [@bishop2021mapping], generate maps of the extent and elevation of the intertidal zone [@murray2012continental; @sagar2017item; @bishop2019NIDEM], and create tidally-constrained imagery composites of the coastline at low and high tide [@sagar2018composites]. However, these approaches have been historically based on bespoke, closed-source or difficult to install tide modelling tools, limiting the reproducibility and portability of these techniques to new coastal EO applications. To support the next generation of coastal EO workflows, there is a pressing need for new open-source approaches for combining satellite data with tide modelling. + +The `eo-tides` package aims to address these challenges by providing a set of performant open-source Python tools for attributing satellite EO data with modelled ocean tides. This functionality is provided in five main analysis modules (`utils`, `model`, `eo`, `stats`, `validation`) which are described briefly below. + +# Key functionality + +## Setting up tide models + +A key barrier to utilising tide modelling in EO workflows is the complexity and difficulty of initially setting up global ocean tide models for analysis. To address this, the [`eo_tides.utils`](https://geoscienceaustralia.github.io/eo-tides/api/#eo_tides.utils) module contains useful tools for preparing tide model data files for use in `eo-tides`. This includes the `list_models` function that provides visual feedback on the tide models a user has available in their system, while highlighting the naming conventions and directory structures required by the underlying `pyTMD` tide prediction software (\autoref{fig:list}). + +Running tide modelling using the default tide modelling data provided by external providers can be slow due to the large size of these files β€” especially for recent high-resolution models like FES2022 [@carrere2022new]. To improve tide modelling performance, it can be extremely useful to clip tide model files to a smaller region of interest (e.g. the extent of a country or coastal region). The `clip_models` function can be used to automatically clip all suitable NetCDF-format model data files to a user-supplied bounding box, potentially improving tide modelling performance by over an order of magnitude. + +These tools are accompanied by comprehensive documentation explaining [how to set up several of the most commonly used global ocean tide models](https://geoscienceaustralia.github.io/eo-tides/setup/), including details on how to download or request access to model files, and how to uncompress and arrange the data on disk. + +![An example output from `list_tides`, providing a useful summary table which clearly identifies available and supported tide models.\label{fig:list}](figures/joss_fig_list.png) + +## Modelling tides + +The [`eo_tides.model`](https://geoscienceaustralia.github.io/eo-tides/api/#eo_tides.model) module is powered by advanced tide modelling functionality from the `pyTMD` Python package [@pytmd]. + +`pyTMD` is an open-source tidal prediction software that aims to simplify the calculation of ocean and earth tides. Tides are frequently decomposed into harmonic constants (or constituents) associated with the relative positions of the sun, moon and Earth. For ocean tides, `pyTMD.io` contains routines for reading major constituent values from commonly available tide models, and interpolating those values to spatial locations. Information for each of the supported tide models is stored within a JSON database that is supplied with `pyTMD`. `pyTMD.astro` contains routines for computing the positions of celestial bodies for a given time. Namely for ocean tides, `pyTMD` computes the longitudes of the sun (S), moon (H), lunar perigree (P), ascending lunar node (N) and solar perigree (PP). `pyTMD.arguments` contains routines for combining the astronomical coefficients with the "Doodson number" of each constituent, along with routines for adjusting the amplitude and phase of each constituent based on their modulations over the 18.6 year nodal period. Finally, `pyTMD.predict` uses results from those underlying functions to predict tidal values at a given location and time. + +To support integration with satellite EO data, the `model_tides` function from `eo_tides.model` wraps `pyTMD` functionality to return predicted tides in a standardised `pandas.DataFrame` format containing information about the tide model, location and time period of each modelled tide. This allows large analyses to be broken into smaller discrete chunks that can be processed in parallel before being combined as a final step. Parallelisation in `eo-tides` is automatically optimised based on the number of available workers and the number of requested tide models and tide modelling locations. This built-in parallelisation can significantly improve tide modelling performance, especially when run on a large multi-core machine (\autoref{tab:benchmark}). + +Table: A benchmark comparison of tide modelling performance with parallelisation on vs. off. This comparison was performed on an 8-core and 32-core Linux machine, for a typical large-scale analysis involving a month of hourly tides modelled at 10,000 modelling locations using three tide models (FES2022, TPXO10, GOT5.6). \label{tab:benchmark} + +| Cores | Parallelisation | No parallelisation | Speedup | +| ----- | ----------------- | ------------------ | ------- | +| 8 | 2min 46s Β± 663 ms | 9min 28s Β± 536 ms | 3.4x | +| 32 | 55.9 s Β± 560 ms | 9min 24s Β± 749 ms | 10.1x | + +The `model_tides` function is primarily intended to support more complex EO-related tide modelling functionality in the downstream `eo_tides.eo` module. However it can also be used independently of EO data, for example for any application that requires a time series of modelled tide heights. In addition to modelling tide heights, the `model_phases` function can also be used to calculate the phase of the tide at any location and time. This can be used to classify tides into high and low tide observations, or determine whether the tide was rising (i.e. flow tide) or falling (i.e. ebb tide) β€” information that can be critical for correctly interpreting satellite-observed coastal processes like changing turbidity and ocean colour [@sent2025time]. + +## Combining tides with satellite data + +The [`eo_tides.eo`](https://geoscienceaustralia.github.io/eo-tides/api/#eo_tides.eo) module contains the package's core functionality, focusing on tools for attributing satellite data with modelled tide heights. These tools can be applied to `xarray`-format satellite data from any coastal location on the planet, for example using data loaded from the cloud using the [Open Data Cube](https://www.opendatacube.org/) and SpatioTemporal Asset Catalogue [@stac2024]. + +For tide attribution, `eo-tides` offers two approaches that differ in complexity and performance: `tag_tides` and `pixel_tides` (\autoref{tab:tide_stats}). The `tag_tides` function provides a fast and efficient method for small scale applications where tides are unlikely to vary across a study area. This approach allocates a single tide height to each satellite data timestep, based on the geographic-centroid of the dataset and the acquisition time of each image. Having tide height as a variable allows the selection and analysis of satellite data based on tides. For example, all available satellite observations for an area of interest could be sorted by tide height, or used to extract and compare the lowest and highest tide images in the time series. + +Tide however typically exhibit spatial variability, with sea levels sometimes varying by up to metres in height across short distances in regions of complex and extreme tidal dynamics. This means that a single satellite image may often capture a range of contrasting tide conditions, making a single modelled tide per image an over-simplification of reality. For larger scale coastal EO analysis, the `pixel_tides` function can be used to seamlessly model tides through both time and space, producing a three-dimensional "tide height" datacube that can be integrated with satellite data. For efficient processing, `pixel_tides` `models tides into a customisable low resolution grid surrounding each satellite image in the time series. These modelled tides are then re-projected back into the original resolution of the input satellite image, returning a unique tide height for every individual satellite pixel through time (\autoref{fig:pixel}). + +Table: Comparison of the `tag_tides` and `pixel_tides` functions. \label{tab:tide_stats} + +| `tag_tides` | `pixel_tides` | +| ----------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- | +| - Assigns a single tide height to each timestep/satellite image | - Assigns a tide height to every individual pixel through time to capture spatial tide dynamics | +| - Ideal for local or site-scale analysis | - Ideal for regional to global-scale coastal product generation | +| - Fast, low memory use | - Slower, higher memory use | +| - Single tide height per image can produce artefacts in complex tidal regions | - Produce spatially seamless results across large extents by applying analyses at the pixel level | + +![An example tide height output produced by the `pixel_tides` function, showing spatial variability in tides across Australasia for a single timestep.\label{fig:pixel}](figures/joss_fig_pixel.png) + +## Calculating tide statistics and satellite biases + +The [`eo_tides.stats`](https://geoscienceaustralia.github.io/eo-tides/api/#eo_tides.stats) module contains tools for calculating statistics describing local tide dynamics, as well as biases caused by interactions between tidal processes and satellite orbits. Complex tide aliasing interactions between temporal tide dynamics and the regular overpass timing of sun-synchronous satellite sensors can prevent these satellites from observing the entire tidal cycle [@eleveld2014estuarine; @sent2025time]. Biases in satellite coverage of the tidal cycle can mean that tidal extremes (e.g. the lowest or highest tides at a location) or particular tidal processes may either never be captured by satellites, or be over-represented in the satellite record. Local tide dynamics can cause these biases to vary greatly both through time and space, making it challenging to compare coastal processes consistently across large spatial extents using EO data [@bishop2019NIDEM]. + +To ensure that coastal EO analyses are not inadvertently affected by tide biases, it is important to understand how well tides observed by satellites capture the full astronomical tide range at a location. The `tide_stats` function compares the subset of tides observed by satellite data against the full range of tides modelled at a regular interval through time across the entire time period covered by the satellite dataset. This comparison is used to calculate several useful statistics that summarise how well a satellite time series captures real-world tidal conditions [@bishop2019NIDEM]. These statistics include: + +1. Spread: The proportion of the modelled astronomical tidal range that was observed by satellites. A high value indicates good coverage of the tide range. +2. High-tide offset: The proportion of the highest tides never observed by satellites, relative to the modelled astronomical tidal range. A high value indicates that the satellite data never captures high tides. +3. Low-tide offset: The proportion of the lowest tides never observed by satellites, relative to the modelled astronomical tidal range. A high value indicates that the satellite data never captures low tides. + +A satellite tide bias investigation for a coastal area of interest will return an automated report and plot (\autoref{fig:stats}), adding insightful tide-based context to a coastal EO analysis: + +![In this example satellite time series, the data captured a biased proportion of the tide range: only observing ~68% of the modelled astronomical tide range, and never observing the lowest 24% of tides. The plot visually demonstrates the relationships between satellite observed tide heights (black dots) and modelled astronomical tide height (blue lines) at this location.\label{fig:stats}](figures/joss_fig_stats.png) + +## Validating modelled tide heights + +The [`eo_tides.validation`](https://geoscienceaustralia.github.io/eo-tides/api/#eo_tides.validation) module contains tools for validating modelled tides against observed sea level data. The tide models supported by `eo-tides` can vary significantly in accuracy across the world's coastlines. Evaluating the accuracy of modelled tides is critical for ensuring that resulting marine or coastal EO analyses are reliable and useful. + +Validation functionality in `eo-tides` provides a convenient tool for loading high-quality sea-level measurements from the GESLA Global Extreme Sea Level Analysis [@GESLAv3] archive – a global dataset of almost 90,713 years of sea level data from 5,119 records across the world. The `load_gauge_gesla` function allows GESLA data to be loaded for the same location and time period as a satellite time series. Differences between modelled and observed tide heights can then be quantified through the calculation of accuracy statistics that include the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), R-squared and bias (\autoref{fig:gesla}). + +Furthermore, different ocean tide models perform differently in different coastal locations. `eo-tides` allows multiple tide models to be compared against GESLA data simultaneously (\autoref{fig:gesla}), empowering users to make informed decisions and choose the optimal tide models for their specific location or application. + +![An example comparison of modelled tides from multiple global ocean tide models (EOT20, GOT5.5, HAMTIDE11) against observed sea level data from the Broome 62650 GESLA tide gauge, Western Australia.\label{fig:gesla}](figures/joss_fig_gesla.png) + +# Research projects + +Early versions of functions provided in `eo-tides` has been used for continental-scale modelling of the elevation and exposure of Australia's intertidal zone [@deaintertidal], multi-decadal shoreline mapping across Australia [@bishop2021mapping] and [Africa](https://www.digitalearthafrica.org/platform-resources/services/coastlines), and to support tide correction for satellite-derived shorelines as part of the `CoastSeg` Python package [@Fitzpatrick2024]. + +# Acknowledgements + +Functions from `eo-tides` were originally developed in the Digital Earth Australia Notebooks and Tools repository [@krause2021dea]. The authors would like to thank all DEA Notebooks contributers and maintainers for their invaluable assistance with code review, feature suggestions and code edits. This paper is published with the permission of the Chief Executive Officer, Geoscience Australia. Copyright Geoscience Australia (2025). + +# References diff --git a/paper/paper.pdf b/paper/paper.pdf new file mode 100644 index 0000000..f9d163c Binary files /dev/null and b/paper/paper.pdf differ diff --git a/pyproject.toml b/pyproject.toml index 0d7f397..74ebd84 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -90,6 +90,9 @@ dev = [ requires = ["setuptools >= 61.0"] build-backend = "setuptools.build_meta" +[tool.setuptools] +packages = ["eo_tides"] + [tool.mypy] files = ["eo_tides"] python_version = "3.10"