Skip to content

Commit

Permalink
Merge pull request #119 from MetOffice/develop
Browse files Browse the repository at this point in the history
Merge all changes from develop onto main before merging CORDEX changes
  • Loading branch information
nhsavage authored Mar 21, 2022
2 parents dd42a86 + c3c66be commit 6b4c3d6
Show file tree
Hide file tree
Showing 19 changed files with 889 additions and 99 deletions.
6 changes: 3 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,11 @@ conda activate pyprecis-environment
:exclamation: *Note: As of v1.0 we are unable to provison the model data necessary for reproducing the full PyPRECIS learning environment via github due to it's large file size. Contact the PRECIS team for more information.*

## Before you start...
Read through the current issues to see what you can help with. If you have your own ideas for improvements, please start a new issues so we can track and discuss your improvement. You must create a new branch for any changes you make.
Read through the current issues to see what you can help with. If you have your own ideas for improvements, please start a new issue so we can track and discuss your improvement. You must create a new branch for any changes you make.

**Please take note of the following guidelines when contributing to the PyPRECIS repository.**

* Please do **not** make changes to the `master` branch. The `master` branch is reserved for files and code that has been fully tested and reviewed. Only the core PyPRECIS developers can/should push to the `master` branch.
* Please do **not** make changes to `main` or `develop` branches. The `main` branch is reserved for files and code that has been fully tested and reviewed. Only the core PyPRECIS developers can push to the `main` and `develop` branches.

* The `develop` branch contains the latest holistic version of the `PyPRECIS` repository. Please branch off `develop` to fix a particular issue or add a new feature.
* Please use the following tokens at the start of a new branch name to help sign-post and group branches:
Expand Down Expand Up @@ -66,5 +66,5 @@ have questions.**
<h5 align="center">
<img src="notebooks/img/MO_MASTER_black_mono_for_light_backg_RBG.png" width="200" alt="Met Office"> <br>
&copy; British Crown Copyright 2018 - 2019, Met Office
&copy; British Crown Copyright 2018 - 2022, Met Office
</h5>
17 changes: 12 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ PyPRECIS is built on [Jupyter Notebooks](https://jupyter.org/), with data proces
Further information about PRECIS can be found on the [Met Office website](https://www.metoffice.gov.uk/precis).

## Contents
The teaching elements of PyPRECIS are contained in the `notebooks` directory. The primary worksheets are:
The teaching elements of PyPRECIS are contained in the `notebooks` directory. The core primary worksheets are:

Worksheet | Aims
:----: | -----------
Expand All @@ -42,7 +42,7 @@ Worksheet | Aims
[5](notebooks/worksheet5.ipynb) | <li>Have an appreciation for working with daily model data</li><li>Understand how to calculate some useful climate extremes statistics</li><li>Be aware of some coding stratagies for dealing with large data sets</li></ul>
[6](notebooks/worksheet6.ipynb) | An extended coding exercise designed to allow you to put everything you've learned into practise

Additional tutorials specific to the CSSP 20th Century reanalysis datasets:
Additional tutorials specific to the CSSP 20th Century reanalysis dataset:

Worksheet | Aims
:----: | -----------
Expand All @@ -55,10 +55,17 @@ Three additional worksheets are available for use by workshop instructors:

* `makedata.ipynb`: Provides scripts for preparing raw model output for use in notebook exercises.
* `worksheet_solutions.ipyn`: Solutions to worksheet exercices.
* `worksheet6example.ipynb`: Example code for Worksheet 6.
* `worksheet6example.ipynb`: Example code for Worksheet 6.

## Data
The data used in the worksheets is currently only available within the Met Office. Data relating to the CSSP_20CRDS_Tutorials is also available in Zarr format in an Azure Blob Storage Service. See the `data/DATA-ACESS.md` for further details.
Data relating to the PyPRECIS project is currently held internally to the Met Office.

The total data volume for the core worksheets is 36.68 GB, of which ~20 GB is raw pp data. This is too large to be stored on github, or via git lfs.
As of v2.0, the storage solution for making this data available alongside the notebooks is still under investgation.

Data relating to the **CSSP 20CRDS** tutorials is held online in an Azure Blob Storage Service. To access this data user will need a valid shared access signature (SAS) token. The data is in [Zarr](https://zarr.readthedocs.io/en/stable/) format and the total volume is ~2TB. The data is in hourly, 3 hourly, 6 hourly, daily and monthly frequencies stored seperatrely under the `metoffice-20cr-ds` container on MS-Azure. Monthly data only is also via [Zenodo](https://zenodo.org/record/2558135).



## Contributing
Information on how to contribute can be found in the [Contributing guide](CONTRIBUTING.md).
Expand All @@ -69,5 +76,5 @@ PyPRECIS is licenced under BSD 3-clause licence for use outside of the Met Offic

<h5 align="center">
<img src="notebooks/img/MO_MASTER_black_mono_for_light_backg_RBG.png" width="200" alt="Met Office"> <br>
&copy; British Crown Copyright 2018 - 2020, Met Office
&copy; British Crown Copyright 2018 - 2022, Met Office
</h5>
12 changes: 0 additions & 12 deletions data/DATA-ACCESS.md

This file was deleted.

23 changes: 23 additions & 0 deletions dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
FROM continuumio/miniconda3

RUN apt-get update

# Set working directory for the project
WORKDIR /app

SHELL ["/bin/bash", "--login", "-c"]

RUN apt-get install -y git

# Create Conda environment from the YAML file
COPY environment.yml .
RUN pip install --upgrade pip

RUN conda env create -f environment.yml

RUN conda init bash
RUN conda activate pyprecis-environment

RUN pip install ipykernel && \
python -m ipykernel install --name pyprecis-training

22 changes: 14 additions & 8 deletions environment.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,17 @@
name: pyprecis-environment
channels:
- conda-forge
- defaults
dependencies:
- python=3.6.6
- numpy
- matplotlib
- cartopy=0.16.0
- dask=0.19.4
- iris=2.2.0
dependencies:
- python=3.6.10
- iris=2.4.0
- numpy=1.17.4
- matplotlib=3.1.3
- nc-time-axis=1.2.0
- jupyter_client=6.1.7
- jupyter_core=4.6.3
- dask=2.11.0
- notebook=5.7.8
- mo_pack=0.2.0
- boto3
- botocore
- tqdm
129 changes: 129 additions & 0 deletions notebooks/awsutils/README-AWS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@

## AWS

### Create an EC2 instance

* Select Eu-west2 (London) region from the top right of navigation bar
* Click on Launch instance
* Choose Amazon Linux 2 AMI (HVM) kARNEL 5.10 64-bit (- X86) machine, click select
* Choose t2.2xlarge and click next: configure instance details
* Choose subnet default eu-west-2c
* In IAM role choose existing trainings-ec2-dev role and click next: storage
* 8 gb is fine, click next: add tags
* Add following tags
* Name: [Unique Instance name]
* Tenable: FA
* ServiceOwner: [firstname.lastname]
* ServiceCode: PABCLT
* add securitygroup, select an existing security group: IAStrainings-ec2-mo
* Review and Launch and then select launch
* It will prompt to set a key pair (to allow ssh). create a new key and download it.

It will create the instance. To see the running instance goto instances and instacne state will be "Running"

### SSH instance on VDI


* Save the key (.pem) to .ssh and set the permission: chmod 0400 ~/.ssh/your_key.pem
* Open ~/.ssh/config and add following:

```
Host ec2-*.eu-west-2.compute.amazonaws.com
IdentityFile ~/.ssh/your_key.pem
User ec2-user
```

* Find the public IPv4 DNS and ssh in using it ssh ec2-<ip address>.eu-west-2.compute.amazonaws.com, public IPv4 DNS can be found in instance detail on AWS. Click on your instance and it will open the details.

* Remember to shutdown the instance when not using it. It will save the cost.
### create s3 bucket

* goto s3 service and press "create bucket"
* name the bucket
* set region to EU (London) eu-west-2
* add tags:
* Name: [name of bucket or any unique name]
* ServiceOwner: [your-name]
* ServiceCode: PABCLT
* Tenable: FA
* click on "create bucket"

### Key configurations


The above script run only when config files contains latest keys. In order to update the keys:

* go to AB climate training dev --> Administrator access --> command line or programmatic access
* Copy keys in "Option 1: Set AWS environment variables"
* In VDI, paste (/replace existing) these keys in ~/.aws/config
* add [default] in first line
* Copy keys in "Option 2: Add a profile to your AWS credentials file"
* In VDI, Paste the keys in credentials file: ~/.aws/credentials (remove the first copied line, looks somethings like: [198477955030_AdministratorAccess])
* add [default] in first line

The config and credentials file should look like (with own keys):

```
[default]
export AWS_ACCESS_KEY_ID="ASIAS4NRVH7LD2RRGSFB"
export AWS_SECRET_ACCESS_KEY="rpI/dxzQWhCul8ZHd18n1VW1FWjc0LxoKeGO50oM"
export AWS_SESSION_TOKEN="IQoJb3JpZ2luX2VjEGkaCWV1LXdlc3QtMiJH"
```

### Loading data on s3 bucket from VDI (using boto3)

to upload the file(s) on S3 use: /aws-scripts/s3_file_upload.py
to upload the directory(s) on S3 use: /aws-scripts/s3_bulk_data_upload.py

### AWS Elastic container repository

Following instructions are for creating image repo on ECR and uploading container image

* ssh to the previously created EC2 instance, make an empty Git repo:

```
sudo yum install -y git
git init
```
* On VDI, run the following command to push the PyPrecis repo containing the docker file to the EC2 instance:
```
git push <ec2 host name>:~
```

* Now checkout the branch on EC2: git checkout [branch-name]
* Install docker and start docker service

```
sudo amazon-linux-extras install docker
sudo service docker start
```

* build docker image:

```
sudo docker build .
```

* goto AWS ECR console and "create repository", make it private and name it

* Once created, press "push commands"

* copy the command and run it on EC2 instance, it will push the container image on record. if get "permission denied" error, please add "sudo" before "docker" in the command.



### AWS Sagemaker: Run notebook using custom kernel
The instructions below follow the following tutorial:
https://aws.amazon.com/blogs/machine-learning/bringing-your-own-custom-container-image-to-amazon-sagemaker-studio-notebooks/

* goto Sagemaker and "open sagemaker domain"
* add user
* Name and and select Amazonsagemaker-executionrole (dafult one)

* Once user is created, goto "attach image"
* Select "New Image" and add image URI (copy from image repo)
* Give new image name, display name, sagmaker-executionrole and add tags and attach the image
* add kernel name and display name (both can be same)
* Now, launch app -> Studio and it will open the Notebook dashboard.
* Select python notebook and add your custom named Kernel
111 changes: 111 additions & 0 deletions notebooks/awsutils/fetch_s3_file.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@

import io
import os
import boto3
from urllib.parse import urlparse
from fnmatch import fnmatch
from shutil import copyfile


def _fetch_s3_file(s3_uri, save_to):

bucket_name, key = _split_s3_uri(s3_uri)
print(f"Fetching s3 object {key} from bucket {bucket_name}")

client = boto3.client("s3")
obj = client.get_object(
Bucket=bucket_name,
Key=key,
)
with io.FileIO(save_to, "w") as f:
for i in obj["Body"]:
f.write(i)


def _save_s3_file(s3_uri, out_filename, file_to_save="/tmp/tmp"):
bucket, folder = _split_s3_uri(s3_uri)
out_filepath = os.path.join(folder, out_filename)
print(f"Save s3 object {out_filepath} to bucket {bucket}")
client = boto3.client("s3")
client.upload_file(
Filename=file_to_save,
Bucket=bucket,
Key=out_filepath
)


def _split_s3_uri(s3_uri):
parsed_uri = urlparse(s3_uri)
return parsed_uri.netloc, parsed_uri.path[1:]


def find_matching_s3_keys(in_fileglob):

bucket_name, file_and_folder_name = _split_s3_uri(in_fileglob)
folder_name = os.path.split(file_and_folder_name)[0]
all_key_responses = _get_all_files_in_s3_folder(bucket_name, folder_name)
matching_keys = []
for key in [k["Key"] for k in all_key_responses]:
if fnmatch(key, file_and_folder_name):
matching_keys.append(key)
return matching_keys


def _get_all_files_in_s3_folder(bucket_name, folder_name):
client = boto3.client("s3")
response = client.list_objects_v2(
Bucket=bucket_name,
Prefix=folder_name,
)
all_key_responses = []
if "Contents" in response:
all_key_responses = response["Contents"]
while response["IsTruncated"]:
continuation_token = response["NextContinuationToken"]
response = client.list_objects_v2(
Bucket=bucket_name,
Prefix=folder_name,
ContinuationToken=continuation_token,
)
if "Contents" in response:
all_key_responses += response["Contents"]
return all_key_responses


def copy_s3_files(in_fileglob, out_folder):
'''
This function copy files from s3 bucket to local directory.
args
---
in_fileglob: s3 uri of flies (wild card can be used)
out_folder: local path where data will be stored
'''
matching_keys = find_matching_s3_keys(in_fileglob)
in_bucket_name = _split_s3_uri(in_fileglob)[0]
out_scheme = urlparse(out_folder).scheme
for key in matching_keys:
new_filename = os.path.split(key)[1]
temp_filename = os.path.join("/tmp", new_filename)
in_s3_uri = os.path.join(f"s3://{in_bucket_name}", key)
_fetch_s3_file(in_s3_uri, temp_filename)
if out_scheme == "s3":
_save_s3_file(
out_folder,
new_filename,
temp_filename,
)
else:
copyfile(
temp_filename, os.path.join(out_folder, new_filename)
)
os.remove(temp_filename)


def main():
in_fileglob = 's3://ias-pyprecis/data/cmip5/*.nc'
out_folder = '/home/h01/zmaalick/myprojs/PyPRECIS/aws-scripts'
copy_s3_files(in_fileglob, out_folder)


if __name__ == "__main__":
main()
Loading

0 comments on commit 6b4c3d6

Please sign in to comment.