-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update technology training #62
Merged
Merged
Changes from 18 commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
9180cb5
update outline of module on index page (first draft)
JessicaS11 23590cf
start updating steps to add team member
JessicaS11 1d93113
merge main
scottyhq fd621a6
update into page
scottyhq f93c94d
deal with jbook parse warnings
scottyhq 713b27c
Merge branch 'main' into tech-module
scottyhq ffa068c
Merge branch 'main' into tech-module
scottyhq 3beb563
try indent yaml
scottyhq 1bbbfff
old-style directive formatting
scottyhq 01830e0
fix links to trainings
scottyhq 6c653c2
wrap with single quotes
scottyhq f3217df
add glossary, updates to recognition
scottyhq 036fc59
glossary review comments
scottyhq 97f9452
recognition suggestions, update email contact
scottyhq 01af878
updated screenshots
scottyhq 94e416f
update content
scottyhq 117b850
final contents
scottyhq ab0cea3
spelling and links
scottyhq 6691b6a
review comments
scottyhq File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,8 +18,7 @@ jb build docs | |
|
||
## Contact | ||
|
||
* [Anthony Arendt](mailto:[email protected]) | ||
* [Scott Henderson](mailto:[email protected]) | ||
* [email eScience](mailto:[email protected]) | ||
|
||
## License | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
# Glossaries | ||
|
||
## Tools and Technology (general) | ||
|
||
```{glossary} | ||
[Conda](https://docs.conda.io) | ||
Package, dependency and environment management for any language—Python, R, | ||
Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN, and more. | ||
|
||
[Mamba](https://mamba.readthedocs.io) | ||
Is an alternative package manager to conda that is fast, robust, and cross-platform. | ||
|
||
[Conda-forge](https://conda-forge.org) | ||
Is the main open-access repository for hosting packages that are installed via conda or mamba. | ||
|
||
[Docker](https://www.docker.com) | ||
Docker provides the ability to package and run an application in a loosely | ||
isolated environment called a container. It is widely used for creating | ||
reproducible software environments to run code on different computers. | ||
|
||
[Git](https://git-scm.com) | ||
A popular version control system that is used in many open source software | ||
projects to manage their software code base. | ||
|
||
[GitHub](https://github.com) | ||
A service platform that allows developers to create, store, manage and share their code using the "git". | ||
|
||
[GitHub Actions](https://github.com/features/actions) | ||
Continuous integration and continuous delivery (CI/CD) GitHub feature that allows you to automate computational workflows for a GitHub repository. | ||
|
||
[GitHub Pages](https://pages.github.com) | ||
GitHub feature that allows you to host a website connected to a repository or organization | ||
|
||
[Hackweek](https://uwhackweek.github.io/hackweeks-as-a-service) | ||
Participant-driven events that strive to create welcoming spaces to learn new | ||
things, build community and gain hands-on experience with collaboration and | ||
team science. | ||
|
||
[Project Jupyter](https://jupyter.org) | ||
Project Jupyter (name derived from "JUlia PYThon and R") exists to develop | ||
open-source software, open-standards, and services for interactive computing | ||
across dozens of programming languages. | ||
|
||
[Jupyter Book](https://jupyterbook.org/intro.html) | ||
Jupyter Book is an open source project for building beautiful, | ||
publication-quality books and documents from computational material. | ||
|
||
[JupyterHub](https://jupyterhub.readthedocs.io) | ||
A core open source tool from the Jupyter community, JupyterHub allows you to | ||
deploy an application that provides remote data science environments to | ||
multiple users. It can be deployed in the cloud, or on your own hardware. | ||
scottyhq marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
[JupyterLab](https://jupyterlab.readthedocs.io) | ||
JupyterLab is the next-generation web-based user interface for Project Jupyter | ||
intended to replace the JupyterNotebook interface. | ||
|
||
[Jupyter Notebook](https://jupyterbook.org) | ||
open-source web application that allows you to create and share documents that | ||
contain live code, equations, visualizations and narrative text. | ||
|
||
[MyST](https://mystmd.org/guide/quickstart-myst-markdown) | ||
Markedly Structured Text (MyST) is a rich and extensible flavor of Markdown | ||
meant for technical documentation and publishing. It is used by Jupyter Book and Myst tools. | ||
|
||
[Slack](https://slack.com) | ||
A communication platform that we use to share information. We use separate channels | ||
for each project and also rely on the video features. If possible we recommend | ||
[downloading the Slack app](https://slack.com/downloads). If your agency does not allow | ||
you to use the app, you can still interface with Slack in a web browser. | ||
|
||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,43 +1,78 @@ | ||
# Data management | ||
|
||
A challenging aspect of tutorial development is how to manage dataset dependencies. Ideally, the data you use will be *publicly accessible* and *permanent* for reproducibility. | ||
A challenging aspect of tutorial development is how to manage dataset dependencies. Ideally, the data you use will be *publicly accessible* and *permanent* for reproducibility. In this section we present practical guidelines for effective data sharing for Hackweek Tutorials and Projects. | ||
|
||
```{important} | ||
Remember, a hackweek tutorial is *learning-oriented* and should guide participants through a step-wise process with a meaningful outcome. | ||
Remember, a hackweek tutorial is *learning-oriented* and should guide participants through a step-wise process with a meaningful outcome. If you typically work with large datasets, consider designing your tutorial to work with a small subset (~10 MB) that still enables your learning objectives to be met. | ||
``` | ||
|
||
If you typically work with large datasets, consider designing your tutorial to work with a small subset (~10 MB) to achieve your learning objectives. Below are some general guidelines based on past hackweeks: | ||
|
||
## Computational resource considerations | ||
In order for tutorial notebooks to be executable on widely available public computing infrastructure *we recommend targeting limited computational requirements such as 2-core CPU, 8 GB of RAM memory, 10 GB of disk space (at the time of writing)* | ||
|
||
## Make your tutorial data publicly accessible! | ||
## Guidelines for Tutorials | ||
scottyhq marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### My data is small (<10MB) | ||
### <10MB | ||
If your tutorial just needs a small image, or tabular data like a `.csv` file, go ahead and add it to the repository along with your tutorial code. | ||
|
||
### My data is moderate (10 - 100 MB) | ||
You can create a separatel repository on GitHub to publicly host your tutorial dataset. [Per GitHub repository limits](https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github), *individual files are required to be < 100 MB*. | ||
Here is an [example for images](https://github.com/snowex-hackweek/tutorial-data), and here is an [example for n-dimensional arrays](https://github.com/scottyhq/zarrdata). | ||
### 10 - 100 MB | ||
You can create a separate repository on GitHub to publicly host your tutorial dataset. [Per GitHub repository limits](https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github), *individual files are required to be < 100 MB*. | ||
* Here is an [example for images](https://github.com/snowex-hackweek/tutorial-data) | ||
* Here is an [example for n-dimensional arrays](https://github.com/scottyhq/zarrdata). | ||
|
||
```{note} | ||
If using a subset be sure to capture data provenance, for example by including a script that you used to access the original full-sized dataset from the data provider. | ||
``` | ||
|
||
### My data is cumbersome (>100 MB) | ||
Increasingly there are ways to load remote data in a streaming fashion, which allows you to avoid download and storage! At a basic level, software can read URLs instead of local file paths, such as at the beginning of [this tutorial](https://snowex-hackweek.github.io/website/tutorials/sar/sentinel1.html#dive-right-in). | ||
#### GitHub Release artifacts | ||
|
||
Generally it is not advisable to store binary files in GitHub repositories. Event if you make small changes to a file, an entire new copy is saved in the revision history and the size of the repository will quickly get unwieldy. | ||
|
||
[GitHub Releases](https://docs.github.com/en/repositories/releasing-projects-on-github/about-releases) are a feature of GitHub repositories that archive a snapshot of files in your repository *in addition to other auxiliary files*. According to official GitHub documentation: | ||
|
||
> You can create a release to package software, along with release notes and links to binary files, for other people to use. | ||
|
||
At the time of writing, *each file included in a release must be under 2 GiB*. So storing tutorial data as files attached to a GitHub release of tutorial code can work well to keep code and associated data together. | ||
|
||
* Here is an [example of attaching a large 100 MB Geotiff to a release artifact](https://github.com/scottyhq/share-a-raster/releases/tag/v0.0.1) | ||
|
||
```{note} | ||
The GitHub Command Line Interface (CLI) provides a convenient method for downloading release data https://cli.github.com/manual/gh_release_download | ||
``` | ||
|
||
### >100 MB | ||
|
||
#### Stream from URLs | ||
Increasingly there are ways to load remote data in a streaming fashion, which allows you to avoid download and storage altogether! Essential this means using software that can read URLs instead of local file paths, such as at the beginning of [this tutorial](https://snowex-hackweek.github.io/website/tutorials/sar/sentinel1.html#dive-right-in). | ||
|
||
```{note} | ||
Software that can read URLs still ultimately must download data! It will either be stored only in RAM, or as a temporary file on your hard drive, so be aware that you are still constrained by your local computing resources. | ||
``` | ||
|
||
```{warning} | ||
If your tutorial streams data directly from a data provider, check that scheduled server downtime for maintenance isn't planned during your presentation! Also, be aware that URLs can be changed at any time by the data provider. | ||
Check that scheduled server downtime for maintenance isn't planned during your presentation! Also, be aware that URLs can be changed at any time by the data provider. | ||
``` | ||
* Here is an [example using the Python earthaccess library](https://earthaccess.readthedocs.io/en/latest/tutorials/file-access/) | ||
|
||
## Data permanence | ||
If you want long-term hosting of a tutorial dataset that receives a citable Digital Object Identifier (DOI), you can use [Zenodo.org](https://about.zenodo.org). | ||
* Libraries like Xarray can [read data directly from cloud storage](https://docs.xarray.dev/en/stable/user-guide/io.html#cloud-storage-buckets) | ||
|
||
1. [Link your GitHub repository with data Zenodo](https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content) *subject to GitHub repository size limits | ||
#### Use Zenodo.org | ||
Another approach is to upload your data on Zenodo, which at the time of writing has a standard 50 GB limit (https://library.cfa.harvard.edu/data-archiving-and-sharing). | ||
|
||
2. [Use Zenodo directly](https://library.cfa.harvard.edu/data-archiving-and-sharing) *50 GB standard limit | ||
```{note} | ||
https://github.com/fatiando/pooch is a nice Python utility to fetch data from Zenodo | ||
``` | ||
|
||
[Here](https://snowex-hackweek.github.io/website/tutorials/thermal-ir/thermal-ir-tutorial.html#) is an example tutorial that retrieves a dataset from a [Zenodo 'record'](https://zenodo.org/record/5504396) | ||
## Data permanence considerations | ||
Be aware that GitHub repositories can be deleted at any time by repository owners. For guaranteed long-term (10+years) hosting of a tutorial dataset that receives a citable Digital Object Identifier (DOI) you can use [Zenodo.org](https://about.zenodo.org). You can easily [link a GitHub repository with Zenodo](https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content) | ||
|
||
* [Here](https://snowex-hackweek.github.io/website/tutorials/thermal-ir/thermal-ir-tutorial.html#) is an example tutorial that retrieves a dataset from a [Zenodo 'record'](https://zenodo.org/record/5504396) | ||
|
||
## Computational resource considerations | ||
In order for tutorial notebooks to be executable on different machines, *we recommend targeting limited computational requirements such as 2-core CPU, 8 GB of RAM memory, 10 GB of disk space (at the time of writing)* | ||
## Guidelines for Projects | ||
|
||
### JupterHub Data Sharing | ||
|
||
During a hackweek, teams often want to share data with each other for collaborative analysis. In contrast to tutorial datasets which are usually hand-picked, project data is dynamic and changing over time. By using a JupyterHub during a hackweek, participants can take advantage of networked storage drives and pre-configured Cloud Object Storage. | ||
|
||
```{note} | ||
JupyterHubs do not always have the same configuration, but we encourage you to review this guide from 2i2c which explains options for JupyterHub storage (https://docs.2i2c.org/user/topics/data/) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Where to go for help during a hackweek | ||
|
||
With all the moving pieces, it can be hard to know where to turn for help. Check out this decision tree to help you figure out the best sources of information depending on your issue. | ||
|
||
![ques_dec_tree](../../images/SupportDecisionTree.svg) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,19 @@ | ||
# Technology | ||
|
||
*Paragraph providing brief overview of this module* | ||
Hackweeks are highly technical in nature. We utilize multiple websites and software tools to facilitate full participation by all organizing team members and participants. We strive to utilize technologies that are open-source and facilitated open science - that is enabling reproducibility and wide participation. | ||
|
||
The technological landscape evolves rapidly! We created this [glossary page](../../reference/glossary.md) to help you keep track of tools that we regularly refer to. | ||
|
||
In this section, you will learn how to make changes to Hackweek websites via pull requests on GitHub. You will also learn how we use automated GitHub Actions workflows to generate consistent and quality controlled Jupyter Notebooks that are converted to a public website. Finally we will discuss best practices for data management when designing tutorials working collaboratively on projects during and after a hackweek. | ||
|
||
## Learning Objectives | ||
|
||
After completing this module, hackweek supporters will: | ||
After completing this module, hackweek organizers and participants will: | ||
|
||
* Be familiar with the suite of technology used during a UW Hackweek (GitHub, Jupyter Hub, Jupyter Book) | ||
* Understand recommended tools for tutorial creation and project work | ||
* Know where to go for technology support before and during the hackweek | ||
|
||
* have a comprehensive understand of the suite of technology tools used to support hackweek learners | ||
* understand how to use our technology tools within their specific supporting role (e.g. tutorial creation, project work) | ||
* know where to go for technology support before and during the hackweek | ||
Specific walk-throughs are provided for: | ||
* How to effectively share data and code during and after a hackweek | ||
* Adding yourself to the event website as a member of the organizing team |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: