Skip to content

Commit

Permalink
Update paper.md
Browse files Browse the repository at this point in the history
minor grammatical and orthographic fixes
  • Loading branch information
degiacom authored Jan 22, 2025
1 parent a4a9f5d commit 439993f
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,9 @@ We present an open-source course teaching how to set-up and analyse molecular dy

# Statement of Need

Biomolecular systems were one of the first systems used in molecular dynamics (MD) simulations [@levitt1975computer]. As such biomolecular simulations build on a rich half a century history rich of methodological developments, embodied in a wide range of specialised software. The improvement in physical models dictating interatomic interactions coupled with an ever-increasing availability of computational power have enabled MD simulations to establish themselves as a technique complementary to experimental data [@hollingsworth2018molecular] [@ciccotti2022molecular]. Starting from the simulation of small proteins for only a few nanoseconds [@levitt1975computer], nowadays large biomolecular complexes featuring millions of atoms can be simulated for timescales orders of magnitude longer [@lindorff-larsen2011howa]. The data produced by MD simulations is noisy and high-dimensional though, and its usefulness is directly dependent on how faithfully the molecular system simulated recapitulates the physiochemical conditions of its real-world counterpart. Since the mid-1970s, significant progress has been made in automating the preparation of biologically relevant atomistic models and the analysis of simulation data. Nonetheless, modern computational scientists must still make critical decisions on how to assemble and simulate the system, as well as which quantities to extract from the resulting data to accurately explain or predict experimental outcomes.
Biomolecular systems were among the first to be studied using molecular dynamics (MD) simulations [@levitt1975computer]. As a result, biomolecular simulations are built on half a century of rich methodological development, embodied in a wide range of specialized software. The improvement in physical models dictating interatomic interactions coupled with an ever-increasing availability of computational power have enabled MD simulations to establish themselves as a technique complementary to experimental data [@hollingsworth2018molecular] [@ciccotti2022molecular]. Starting from the simulation of small proteins for only a few nanoseconds [@levitt1975computer], nowadays large biomolecular complexes featuring millions of atoms can be simulated for timescales orders of magnitude longer [@lindorff-larsen2011howa]. The data produced by MD simulations is noisy and high-dimensional though, and its usefulness is directly dependent on how faithfully the molecular system simulated recapitulates the physiochemical conditions of its real-world counterpart. Since the mid-1970s, significant progress has been made in automating the preparation of biologically relevant atomistic models and the analysis of simulation data. Nonetheless, modern computational scientists must still make critical decisions on how to assemble and simulate the system, as well as on which quantities to extract from the resulting data to accurately explain or predict experimental outcomes.

The material presented in this course has been developed as training material for the CCPBioSim consortium. Since 2022, it has been delivered to three cohorts of 25-35 international postgraduates attending the UK-based CCP5 Summer School on Molecular simulation. A first key aspect of this course is that, under the same hood, it provides information on both the set-up and the analysis of MD simulations, typically presented separately. A second key aspect is that it demonstrates how machine learning techniques can be integrated in the analysis of MD simulations and used to extract relevant information from an MD simulation.
The material presented in this course has been developed as training material for the CCPBioSim consortium. Since 2022, it has been delivered to three cohorts of 2535 international postgraduates attending the UK-based CCP5 Summer School on Molecular simulation. A first key aspect of this course is that, under the same hood, it provides information on both the set-up and the analysis of MD simulations, typically presented separately. A second key aspect is that it demonstrates how machine learning techniques can be integrated in the analysis of MD simulations and used to extract relevant information from an MD simulation.


# Overview, Content, and Structure
Expand All @@ -52,7 +52,7 @@ This is a graduate-level course, aimed at beginners in biomolecular simulation.

## Content

The objective of this course is not to make students proficient in one or few selected software for MD simulation preparation, execution, or analysisis. Instead, it is aimed at providing students with a general overview of the key decision-making required to carry out MD simulations of biomolecules and extracting quantitative data from them. In this context, the course is subdivided in two Units featuring practical sessions and lectures. Practical sessions demonstrate how key concepts in molecular modelling are put into practice by exposing student to authentic tasks leveraging on commonly used Python packages, such as MDAnalysis [@michaud-agrawal2011mdanalysis] [@oliver_beckstein-proc-scipy-2016] [@alibay2023building] and scikit-learn [@pedregosa2011scikitlearn]. Lectures are software-agnostic and provide additional material to the course. While each practical session can be run by students on their own computer, these are also available in Google colab. This solution, requiring no local installation, is especially suitable for those unfamiliar with how to set-up a Python environment, or having limited access to computational resources.
The objective of this course is not to make students proficient in one or a few specific software tools e for MD simulation preparation, execution, or analysis. Instead, it is aimed at providing students with a general overview of the key decision-making required to carry out MD simulations of biomolecules and extracting quantitative data from them. In this context, the course is divided into two Units featuring practical sessions and lectures. Practical sessions demonstrate how key concepts in molecular modelling are put into practice by exposing student to authentic tasks leveraging on commonly used Python packages, such as MDAnalysis [@michaud-agrawal2011mdanalysis] [@oliver_beckstein-proc-scipy-2016] [@alibay2023building] and scikit-learn [@pedregosa2011scikitlearn]. Lectures are software-agnostic and provide additional material to the course. While each practical session can be run by students on their own computer, these are also available in Google colab. This solution, requiring no local installation, is especially suitable for those unfamiliar with setting up a Python environment, or having limited access to computational resources.

### Unit 1: Simulation Preparation

Expand Down Expand Up @@ -90,14 +90,14 @@ The second Unit is dedicated to providing the students with means to extract rel

Each Jupyter notebook contains information on a specific topic, as well as tasks the student is asked to carry out independently. The tasks range from interpreting data previously produced, to running presented code with different parameters, to solving a specific problem by implementing a short Python code. Solutions to all questions are provided in each notebook as drop-down cells, enabling students to self-assess their understanding.

In our teaching practice, we provide students with post-its of two different colours that can be displayed on their computer screen --- yellow indicating that everything is clear, pink indicating that help is required. At the end of each practical session, studends are asked to use these same post-its to provide instructors with feedback on something they liked (yellow post-it), and something that requires improvement (pink post-it). In the three years we have delivered this course, this appoach has enabled us to gather comprehensive feedback, helping us fine-tuning the teaching material and our own delivery style. A key observation is that students, when presented with a new notebook, especially appreciate the instructors spending few minutes describing the overall notebook structure and the tasks it features, before working through the beginning of it.
In our teaching practice, we provide students with post-its of two different colours that can be displayed on their computer screen --- yellow indicating that everything is clear, pink indicating that help is required. At the end of each practical session, students are asked to use these same post-its to provide instructors with feedback on something they liked (yellow post-it), and something that requires improvement (pink post-it). In the three years we have delivered this course, this approach has enabled us to gather comprehensive feedback, which has helped us fine-tune both the teaching material and our delivery style. A key observation is that students, when presented with a new notebook, especially appreciate the instructors taking a few minutes to describe the overall structure of the notebook and the tasks it contains before beginning the practical work.


# Conclusion

Thanks to the increasing availability of computational power and software automating many of the processes associated with biomolecular simulation and analysis, the palette of questions addressable with MD is broadening. While this is positive, it remains crucial for computational scientists to have a clear understanding of what is being simulated and how. Indeed, to date many decisions associated with system building and analysis cannot be delegated to a machine without human verification. In this context, we see our course as a first stepping-stone, detailing the key decisions that need to be made, providing examples of how this can be done in practice, and directing learners to relevant software and specialized analysis techniques for further education.

Despite its long history, MD remains an evolving field. New techniques that push the boundaries of what is possible keep emerging, as exemplified by the current revolution associated with the integration of modern machine learning techniques in molecular modelling pipelines. While we expect that majority of the concepts presented in this course will be valid for many years to come, we are endeavouring to keeping the course material up-to-date by highlighting current methodological trends. For instance in the latest iteration of this course we have introduced a discussion on how how to interpret and use models produced by AlphaFold [@jumper2021highly].
Despite its long history, MD remains an evolving field. New techniques that push the boundaries of what is possible keep emerging, as exemplified by the current revolution associated with the integration of modern machine learning techniques in molecular modelling pipelines. While we expect the majority of the concepts presented in this course to remain valid for many years, we are striving to keep the course material up-to-date by highlighting current methodological trends. For example, in the latest iteration of this course, we have introduced a discussion on how to interpret and use models produced by AlphaFold. [@jumper2021highly].


# Contributions to the course
Expand Down

0 comments on commit 439993f

Please sign in to comment.