Skip to content

Commit

Permalink
Merge pull request #10 from huanglangwen/group5
Browse files Browse the repository at this point in the history
Add group05
  • Loading branch information
twicki authored Aug 17, 2020
2 parents be96603 + 3e3e8c2 commit 55c9c92
Show file tree
Hide file tree
Showing 42 changed files with 70,882 additions and 0 deletions.
1 change: 1 addition & 0 deletions projects2020/group05/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
data/
8 changes: 8 additions & 0 deletions projects2020/group05/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
data
*.dat
*.json
*.pyc
__pycache__
.pytest_cache
.gt_cache
.vscode
76 changes: 76 additions & 0 deletions projects2020/group05/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
FROM ubuntu:20.04

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update \
&& apt-get install -y \
apt-utils \
sudo \
build-essential \
gcc \
g++ \
gfortran \
gdb \
wget \
curl \
tar \
git \
vim \
make \
cmake \
cmake-curses-gui \
python3-pip \
python3-dev \
libssl-dev \
libboost-all-dev \
libnetcdf-dev \
libnetcdff-dev

RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 10 && \
update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 10 && \
update-alternatives --set python /usr/bin/python3 && \
update-alternatives --set pip /usr/bin/pip3

# set TZ
ENV TZ=US/Pacific
RUN echo $TZ > /etc/timezone && \
dpkg-reconfigure --frontend noninteractive tzdata

# install serialbox from source
RUN git clone --single-branch --branch savepoint_as_string https://github.com/VulcanClimateModeling/serialbox2.git /serialbox
RUN cd /serialbox && \
mkdir build && \
cd build && \
cmake -DCMAKE_INSTALL_PREFIX=/usr/local/serialbox -DCMAKE_BUILD_TYPE=Release \
-DSERIALBOX_USE_NETCDF=ON -DSERIALBOX_ENABLE_FORTRAN=ON \
-DSERIALBOX_TESTING=ON ../ && \
make -j8 && \
make test && \
make install && \
/bin/rm -rf /serialbox

# gt4py
RUN pip install git+https://github.com/gridtools/gt4py.git \
&& python -m gt4py.gt_src_manager install

# add default user
ARG USER=user
ENV USER ${USER}
RUN useradd -ms /bin/bash ${USER} \
&& echo "${USER} ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers
ENV USER_HOME /home/${USER}
RUN chown -R ${USER}:${USER} ${USER_HOME}

# create working directory
ARG WORKDIR=/work
ENV WORKDIR ${WORKDIR}
RUN mkdir ${WORKDIR}
RUN chown -R ${USER}:${USER} ${WORKDIR}

WORKDIR ${WORKDIR}
USER ${USER}

ENV PYTHONPATH="/usr/local/serialbox/python:${PYTHONPATH}"

CMD ["/bin/bash"]

57 changes: 57 additions & 0 deletions projects2020/group05/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# A Python Implementation of GFS Scale-Aware Mass-Flux Shallow Convection Scheme Module
## Package `shalconv` structure
- `__init__.py`: configuration
- `funcphys.py`: thermodynamic functions
- `physcons.py`: constants
- `samfaerosols.py`: aerosol processes
- `samfshalcnv.py`: shallow convection scheme
- `serialization.py`: serialization
- `kernels/stencils_*.py`: GT4Py stencils of the shallow convection scheme
- `kernels/utils.py`: useful functions for GT4Py arrays

## Unit tests
- `analyse_xml.py`: dependency analysis of fortran code
- `read_serialization.py`: read serialization data for unit tests
- `run_serialization.py`: generate serialization for unit tests
- `test_fpvsx.py`: test fpvsx function
- `test_part1.py`: test part1 of shallow convection scheme
- `test_part2.py`: test part2 of shallow convection scheme
- `test_part34.py`: test part3 and part4 of shallow convection scheme

## Other files
- `build.sh`: script for building environment as docker image
- `enter.sh`: script for entering the docker environment
- `env_daint`: script for setting up environment in Piz Daint
- `submit_job.sh`: script for submitting SLURM jobs in Piz Daint of benchmarking shalconv scheme with gtcuda and gtx86 backends
- `get_data.sh`: download serialized data
- `main.py`: validation for shallow convection scheme
- `benchmark.py`: benchmark shallow convection scheme with various number of columns (ix)
- `plot.py`: plot benchmark results (already hardcoded)

## Storage in GT4Py
All the arrays are broadcasted or sliced to the shape (1, ix, km) due to restrictions of gt4py stencil.
Operations applied to 1D array of shape (1, ix, 1) are propagated forward and then backward to keep consistency.

## Configuration
`shalconv/__init__.py` specifies several configurations needed to run the scheme, including location of serialization data, backend type,
verbose output and floating/integer number type. One can also specify backend type by setting the environment variable `GT4PY_BACKEND` to be
one of `numpy`, `debug`, `gtx86`, `gtcuda`.

## Build with docker in Linux
execute `build.sh` then `enter.sh`.

## Build with docker in Windows
1. download serialized data and extract them according to `get_data.sh`
2. execute `docker build -t hpc4wc_project .`
3. execute `docker run -i -t --rm --mount type=bind,source={ABSOLUTE PATH OF THIS FOLDER},target=/work --name=hpc4wc_project hpc4wc_project`
4. execute `ipython main.py` or `benchmark.py`

## Run on Piz Diant
1. execute `get_data.sh` to get serialized data
2. CHANGE `ISDOCKER` to False and `DATAPATH` in `shalconv/__init__.py`
3. execute `source env_diant`
4. execute `ipython main.py` or `benchmark.py`

## Tests
Inside tests folder, execute `ipython run_serialization.py` to generate serialization
data needed for tests, then execute `ipython test_*.py` to run tests.
135 changes: 135 additions & 0 deletions projects2020/group05/benchmark.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
from shalconv.serialization import read_random_input, read_data, numpy_dict_to_gt4py_dict, scalar_vars
from shalconv.samfshalcnv import samfshalcnv_func
from shalconv import DTYPE_INT, BACKEND, ISDOCKER
from time import time
import numpy as np
import numpy.f2py, os


def carray2fortranarray(data_dict):

for key in data_dict:
if isinstance(data_dict[key], np.ndarray):
data_dict[key] = np.asfortranarray(data_dict[key])

return data_dict


def samfshalcnv_fort(data_dict):

im = data_dict["im"]
ix = data_dict["ix"]
km = data_dict["km"]
delt = data_dict["delt"]
itc = data_dict["itc"]
ntc = data_dict["ntc"]
ntk = data_dict["ntk"]
ntr = data_dict["ntr"]
delp = data_dict["delp"]
prslp = data_dict["prslp"]
psp = data_dict["psp"]
phil = data_dict["phil"]
qtr = data_dict["qtr"][:,:,:ntr+2]
q1 = data_dict["q1"]
t1 = data_dict["t1"]
u1 = data_dict["u1"]
v1 = data_dict["v1"]
rn = data_dict["rn"]
kbot = data_dict["kbot"]
ktop = data_dict["ktop"]
kcnv = data_dict["kcnv"]
islimsk = data_dict["islimsk"]
garea = data_dict["garea"]
dot = data_dict["dot"]
ncloud = data_dict["ncloud"]
hpbl = data_dict["hpbl"]
ud_mf = data_dict["ud_mf"]
dt_mf = data_dict["dt_mf"]
cnvw = data_dict["cnvw"]
cnvc = data_dict["cnvc"]
clam = data_dict["clam"]
c0s = data_dict["c0s"]
c1 = data_dict["c1"]
pgcon = data_dict["pgcon"]
asolfac = data_dict["asolfac"]

import shalconv_fortran
shalconv_fortran.samfshalconv_benchmark.samfshalcnv(
im = im, ix = ix, km = km, delt = delt, itc = itc,
ntc = ntc, ntk = ntk, ntr = ntr, delp = delp,
prslp = prslp, psp = psp, phil = phil, qtr = qtr,
q1 = q1, t1 = t1, u1 = u1, v1 = v1,
rn = rn, kbot = kbot, ktop = ktop, kcnv = kcnv,
islimsk = islimsk, garea = garea, dot = dot,
ncloud = ncloud, hpbl = hpbl, ud_mf = ud_mf,
dt_mf = dt_mf, cnvw = cnvw, cnvc = cnvc, clam = clam,
c0s = c0s, c1 = c1, pgcon = pgcon, asolfac = asolfac )


def run_model(ncolumns, nrun = 10, compile_gt4py = True):

ser_count_max = 19
num_tiles = 6

input_0 = read_data(0, True)

ix = input_0["ix"]
length = DTYPE_INT(ncolumns)

times_gt4py = np.zeros(nrun)
times_fortran = np.zeros(nrun)

for i in range(nrun):

data = read_random_input(length, ix, num_tiles, ser_count_max)

for key in scalar_vars:
data[key] = input_0[key]

data["ix"] = length
data["im"] = length
data_gt4py = numpy_dict_to_gt4py_dict(data)
data_fortran = carray2fortranarray(data)

if i == 0 and compile_gt4py: samfshalcnv_func(data_gt4py)

# Time GT4Py
tic = time()
samfshalcnv_func(data_gt4py)
toc = time()
times_gt4py[i] = toc - tic

# Time Fortran
tic = time()
samfshalcnv_fort(data_fortran)
toc = time()
times_fortran[i] = toc - tic

return times_gt4py, times_fortran


if __name__ == "__main__":

lengths = [32, 128, 512, 2048, 8192, 32768, 131072] #524288]
nrun = 10
time_mat_gt4py = np.zeros((nrun, len(lengths)))
time_mat_fortran = np.zeros((nrun, len(lengths)))

print("Compiling fortran code")
f2cmp = "--f2cmap tests/fortran/.f2py_f2cmap" if ISDOCKER else ""
os.system(f"f2py {f2cmp} -c -m shalconv_fortran tests/fortran/samfshalconv_benchmark.f90")

print(f"Benchmarking samfshalcnv with backend: {BACKEND}")

for i in range(len(lengths)):

length = lengths[i]
times_gt4py, times_fortran = run_model(length, nrun, i==0)
time_mat_gt4py[:,i] = times_gt4py
time_mat_fortran[:,i] = times_fortran

print(f"ix = {length}, Run time: Avg {times_gt4py.mean():.3f}, Std {np.std(times_gt4py):.3e} seconds")
print(f"Fortran run time: Avg {times_fortran.mean():.3e}, Std {np.std(times_fortran):.3e} seconds")

np.savetxt(f"times-gt4py-{BACKEND}.csv", time_mat_gt4py, delimiter=",")
np.savetxt(f"times-fortran-{BACKEND}.csv", time_mat_fortran, delimiter=",")
1 change: 1 addition & 0 deletions projects2020/group05/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
docker build -t hpc4wc_project .
29 changes: 29 additions & 0 deletions projects2020/group05/docs/NOTE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
1. k_index (k_idx) have to be 1-based
2. kbpl -> kpbl
3. no function call in if branch
4. no boolean field and boolean literals
5. gt4py frontend implements `visitor_*` for ast.py
6. get function source by `inspect.getsource` in `GTScriptParser`
7. `visitor_With` -> `_visit_computation_node`, `_visit_inteval_node`
8. usable functions inside gt4py: [`ABS`, `MOD`, `SIN`, `COS`, `TAN`, `ARCSIN`, `ARCCOS`, `ARCTAN`, `SQRT`, `EXP`, `LOG`,
`ISFINITE`, `ISINF`, `ISNAN`, `FLOOR`, `CEIL`, `TRUNC`]
9. can't have temporary var inside if-conditionals
10. clone `serialbox2` from VulcanClimateModeling
11. In the right conda env, build `serialbox2`: `cmake -DCMAKE_INSTALL_PREFIX=/usr/local/serialbox -DSERIALBOX_USE_NETCDF=ON -DSERIALBOX_ENABLE_FORTRAN=ON -DSERIALBOX_TESTING=ON -DSERIALBOX_USE_OPENSSL=OFF ..`
12. best practice for debugging: PyCharm + Docker
13. `dp`, `tem1`, `tem2`, `dv1h`, `rd` ... are not fields
14. [TODO] fix temp vars in part3,4
15. [ERROR] qtr.shape == (2304, 79, 7), ntr = 2 != qtr.shape[2] + 2
16. Add `init_kbm_kmax`
17. `heso` not correct -> `qeso` not correct -> should be
```fortran
qeso(i,k) = 0.01 * fpvsx(to(i,k)) ! fpvs is in pa
qeso(i,k) = eps * qeso(i,k) / (pfld(i,k) + epsm1*qeso(i,k))
```
18. test fpvsx_gt -> pass
19. `fpvs(to);to=t1` -> `fpvs(t1)`
20. delete `fscav` in part3 serialization and delete `delebar` in part4
21. Solve interval problem for stencil_part34.py line 182
22. notice argument position!
23. notice bound for forward-backward propagation
24. scalar have to be *keyword argument* in stencils (raise error in x86/cuda backends)
Binary file added projects2020/group05/docs/report.pdf
Binary file not shown.
5 changes: 5 additions & 0 deletions projects2020/group05/enter.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
docker stop hpc4wc_project 1>/dev/null 2>/dev/null
docker rm hpc4wc_project 1>/dev/null 2>/dev/null
docker run -i -t --rm \
--mount type=bind,source=`pwd`,target=/work \
--name=hpc4wc_project hpc4wc_project
16 changes: 16 additions & 0 deletions projects2020/group05/env_daint
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
source ~/HPC4WC_venv/bin/activate
module load daint-gpu
module swap PrgEnv-cray PrgEnv-gnu
#module swap gcc gcc/7.3.0
module swap gcc gcc/8.3.0
module load cray-netcdf
module load CMake
export SERIALBOX_DIR=/project/c14/install/daint/serialbox2_master/gnu_debug
export PYTHONPATH=${BASEDIR}:${SERIALBOX_DIR}/python:$PYTHONPATH
export NETCDF_LIB=${NETCDF_DIR}/lib
module load Boost
module load cudatoolkit
NVCC_PATH=$(which nvcc)
CUDA_PATH=$(echo $NVCC_PATH | sed -e "s/\/bin\/nvcc//g")
export CUDA_HOME=$CUDA_PATH
export LD_LIBRARY_PATH=$CUDA_PATH/lib64:$LD_LIBRARY_PATH
22 changes: 22 additions & 0 deletions projects2020/group05/get_data.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/bin/bash

# this scripts downloads test data for the physics standalone

# echo on
set -x

# get name of standalone package
cwd=`pwd`
dirname=`basename ${cwd}`

# remove preexisting data directory
test -d ./data
/bin/rm -rf data

# get data
wget --quiet "ftp://ftp.cscs.ch/in/put/abc/cosmo/fuo/physics_standalone/${dirname}/data.tar.gz"
test -f data.tar.gz || exit 1
tar -xvf data.tar.gz || exit 1
/bin/rm -f data.tar.gz 2>/dev/null

# done
Loading

0 comments on commit 55c9c92

Please sign in to comment.