This repository has been archived by the owner on Oct 9, 2024. It is now read-only.

Meeting notes with ANL application teams

Jump to bottom

xiaodong-yu edited this page Mar 29, 2021 · 8 revisions

Meeting with PtychoNN (Ptychographic image reconstruction)

Attendees: Min, Xiaodong, Tekin Bicer
PtychoNN utilizes a CNN as the surrogate of ptychographic imaging reconstruction. For the data with similar measurement setting and object features, transfer learning is potentially applicable.
Resources:
- https://aip.scitation.org/doi/10.1063/5.0013065
  
  Summary
  
  Domain application: ptychographic imaging (inverse problem: recovering lost phase information from measured intensities)
  Traditional physical approach: model-based iterative phase retrieval methods, including ePIE, maximum-likelihood-based methods, and so on.
  Limitations:
  - computationally expensive
  - requiring manual tuning of phase retrieval parameters (e.g., the choice of algorithms and the initial image and probe guess)
  - requiring a large degree of overlap of the measurement area (adjacent measured scan points need to overlap by at least 50%), thus drastically limit the area or volume of the sample that can be scanned in a given amount of time
  Surrogate model: a deep convolutional neural network that learns a direct mapping from the reciprocal space data to the sample amplitude and phase.
  Model details: the neural network architecture consists of three components:
  - an encoder arm consisting of convolutional and max-pooling layers that learns a representation (encoding) in the feature space of the input x-ray diffraction data.
  - two decoder arms consisting of convolutional and upsampling layers that learn to map from the encoding of the input data to real-space amplitude and phase, respectively.
  For a scan of 161x161 points, the first 100 lines of the experimental scan are used for training the network, and the remaining 61 lines are used for validating.
  Advantages:
  - PtychoNN is ~300 times faster compared to Ptycholib (ePIE based reconstruction) while preserves remarkable accuracy.
  - PtychoNN enables sparse-sampled pytchography since it learns a direct relation between diffraction data and image structure and phase. It relaxes the scan overlap constraint and accelerates data acquisition by a factor of 5.
  - PtychoNN can generate reasonable predictions when trained on a small dataset of as few as 800 experimental samples (3% of the total scan area).
- https://github.com/mcherukara/PtychoNN

Meeting with CANDLE

Attendees: Min, Xiaodong, Arvind Ramanathan
CANDLE team has examples for using several different AI techniques: active learning, reinformance learning, surrogate model.
Resources for surrogate model projects:
- https://www.biorxiv.org/content/10.1101/2020.11.19.390187v1.abstract
  
  Summary
  
  Domain Application: study the mechanisms of infectivity of the SARS-CoV-2 spike protein.
  Traditional physical approach: a number of experimental methods, including x-ray crystallography, cryoelectron (cryo-EM) microscopy, and cryo-EM tomography.
  Surrogate model: all-atom molecular dynamics (MD) simulations.
  Computational challenges: MD simulations generate tremendous amounts of data. For example, the simulations of the WE sampling of the spike protein’s closed-to-open state generated over 100 terabytes of data. It imposes a heavy burden on understanding the intrinsic latent dimensions along which large-scale conformational transitions can be characterized.
  Solutions: AI-driven multiscale MD simulations. It collectively couples these breakthrough simulations with artificial intelligence (AI) based methods as part of an integrated workflow that transfers knowledge gained at one scale to ‘drive’ (enhance) sampling at another.
  Experiments:
  - hardware: TACC Frontera and ORNL Summit systems; V-100 GPUs
  - software: CuPy, NVIDIA NSight Compute 2020 GPU profiling tool, Intel msr-tools
  - observations: training performance: it is rather lightweight that cannot utilize the full GPU and thus only delivering 20% of theoretical peak performance. NAMD simulation performance: with the increase of the compute node, the speedup increases but the scaling efficiency decreases, i.e., the performance improvement is sub-linear.
- Review article: https://www.sciencedirect.com/science/article/abs/pii/S0959440X20302190
  
  Summary
  
  Domain application: intrinsically disordered proteins (IDP) structure-function relationships
  Traditional physical approaches: structure determination techniques including X-ray crystallography, nuclear magnetic resonance (NMR), and cryo-electron microscopy (cryo-EM).
  Surrogate model: artificial intelligence (AI) and machine learning (ML) based multiscale simulations.
  Model details:
  - characterizing the conformational heterogeneity of IDP ensembles: use AI/ML methods to quantify the statistical dependencies in atomistic fluctuations to obtain biophysically-relevant low-dimensional representations spanned by IDP landscapes. Use ML techniques such as anharmonic conformational analysis (ANCA) to analyze IDP ensembles, especially in the context of disorder-to-order transitions. Use deep neural networks such as autoencoders to progressively extract multiscale features from raw inputs.
  - multiscaling IDP ensembles to model emergent phenomena: Use generative adversarial networks (GAN) to enhance sampling, where on-the-fly training is used to modify the potential energy surface. Use AI/ML to iteratively fit and refine force-field parameters in a data-driven fashion.
  - inferring mechanisms of IDP function: use AI approaches, augmented with Bayesian approaches to bridge the gaps between experiments and simulations.
- https://github.com/DeepDriveMD/DeepDriveMD-pipeline
- https://github.com/DeepDriveMD/GordonBell
How to start deployment: Arvind will introduce developer Alex to help us deploy gordonbell work on ThetaGPU
TODO: summary the above two papers and follow up with Arvind and Alex for deployment

Meeting with OpenFoam (CFD application)

Attendees: Min, Xiaodong, Romit Maulik
Interesting simulation v.s. surrogate tradeoff: To achieve required accuracy, surrogate model can only be designed to replace smaller computation step. Consequently more steps are needed when running with surrogates. Thus overall performance can vary as overall time = reduced time-per-step * increased num-of-steps
Resources:
- https://github.com/argonne-lcf/TensorFlowFoam
- https://github.com/argonne-lcf/sdl_ai_workshop
- PythonFOAM: In-situ data analyses with OpenFOAM and Python (same insitu infra as in https://github.com/argonne-lcf/sdl_ai_workshop/tree/master/05_Simulation_ML)
- Geophysical_NAS: Recurrent Neural Network Architecture Search for Geophysical Emulation
  https://arxiv.org/pdf/2004.10928.pdf
  https://github.com/rmjcs2020/Geophysical_NAS
  Summary
  Domain application: Geophysical forecasting (e.g., atmospheric and oceanic modeling)
  Traditional physical approach: relying on the confluence of experimental observations and statistical analyses. E.g., ensembling partial differential equation (PDE)- based forecasts of different weather models.
  Surrogate model: proper-orthogonal decomposition-based long short-term memory networks (POD-LSTM).
  Challenges: The construction of an LSTM architecture for this purpose is generally based on trial and error, requires human expertise, and consumes significant development time.
  Solutions: Using an automated neural architecture search (NAS) approach to generate stacked LSTM architectures for POD-LSTM for a real-world geophysical data set. Three algorithms are used for NAS: Aging evolution (AE), Distributed RL method (RL), and Random search method (RS).
  Experiments:
  - hardware: Theta cluster; each node of Theta is a 64-core Intel Knights Landing processor with 192GB DDR4 memory.
  - software: Python 3.6.6, TensorFlow 1.14, and DeepHyper 0.1.7.
  - key observations: Efficiency of different algorithms: it runs AE, RL, and RS on 128 compute nodes of Theta. AE obtains optimal architectures in a much shorter duration. Scaling: the node utilization of AE and that of RS are similar and are above 0.9 for up to 256 nodes; for 512 nodes, the utilization drops below 0.87. The node utilization for RL is poor, hovering around 0.5. AE consistently is able to evaluate more architecture evaluations than RS and RL can for a fixed wall time than RS and RL can for a fixed wall time. And, RS and RL's number of evaluated architectures is sub-linear to the increase of compute nodes.
Plasma fluid model closures (https://arxiv.org/abs/2002.04106): Not good start. The conclusion of this work was that interaction with surrogate model is not good way to go?
TODO:
- Xiaodong: add paper summary and source of the SC20 paper
- Try toy program in https://github.com/argonne-lcf/sdl_ai_workshop/tree/master/05_Simulation_ML and then deploy PythonFOAM on Bebop (have trouble to install openform on theta)