-
Notifications
You must be signed in to change notification settings - Fork 667
Style Guide
The aim of this guide is to have a uniform and well-tested code base for MDAnalysis in order to improve code quality and maintainability. The Style Guide prescribes various aspects that new code has to conform to; old code should be refactored to conform to these requirements.
The Style Guide evolves through a community-driven process of proposals for changes in policy and discussions among all interested developers.
(For background and history see Issue #404, which contains the original discussion on the Style Guide.)
MDAnalysis is a project with a long history and many contributors and hasn't used a consistent coding style. Since version 0.11.0 we are trying to update all the code to conform with PEP8. Our strategy is to update the style every time we touch an old function and thus switch to PEP8 continuously.
- keep line length to 79 characters or less; break long lines sensibly
- indent with spaces and use 4 spaces per level
- naming (follows PEP8):
- classes: CapitalClasses (i.e. capitalized nouns without spaces)
- methods and functions: underscore_methods (lower case, with underscores for spaces)
MDAnalysis strives to be compatible with python 2 and 3 at the same time. To deal with the differences
we use six which takes care of loading the appropriate functions for each version of python. So instead of xrange
or iterzip
use the zip
and range
function provided by six like this.
from six.moves import zip, range
for i,j in zip(range(3), range(3)):
print(i, j)
We recommend that you either use a Python IDE (PyCharm and others). You can also use external tools like flake8. For integration of external tools with emacs and vim check out elpy (emacs) and python-mode (vim).
To apply the code formatting in an automated way you can also use code formatters. As external tools there are autopep8 and yapf. Most IDE's either have their own code formatter or will work with one of the above through plugins.
We are striving to keep module dependencies small and lightweight (i.e., easily installable with pip
).
- Imports must all happen at the start of a module (not inside classes or functions).
- Modules must be imported in the following order:
-
future (
from __future__ import absolute_import, print_function, division
) - Compatibility imports (eg six)
- global imports (installed packages)
- local imports (MDAnalysis modules)
-
future (
- use absolute imports in the library (i.e. relative imports must be explicitly indicated), e.g.,
from __future__ import absolute_import from six.moves import range import numpy as np import .core import ..units
- In
MDAnalysis.analysis
, all imports must be at the top level (as in the General Rule) — see #666 - Optional modules can be imported.
- No analysis module is imported automatically at the
MDAnalysis.analysis
level to avoid breaking the installation when optional dependencies are missing.
- In the test suite, use the order above, but import
MDAnalysis
modules beforeMDAnalysisTests
imports - Do not use relative imports (e.g.
import .datafiles
) in the test suite because it breaks running the tests from inside the test directory (see #189) - Never import the
MDAnalysis
module from the__init__.py
ofMDAnalysisTests
or from any of its plugins (it's ok to import from test files). ImportingMDAnalysis
from the test setup code will cause severe coverage underestimation.
Any module from the standard library can be used, as well as the following nonstandard libraries:
six
numpy
biopython
gridDataFormats
networkx
mmtf-python
joblib
scipy
matplotlib
tqdm
because these packages are always installed.
If you must depend on a new external package, first discuss its use on the developer mailing list or as part of the issue/PR.
The core of MDAnalysis contains all packages that are not in MDAnalysis.analysis
or MDAnalysis.visualization
. Only packages in the List of core module dependencies can be imported.
Modules under MDAnalysis.analysis
are considered independent. Each can have its own set of dependencies. We strive to keep them small as well but module authors are in principle free to import what they need. These dependencies beyond the core dependencies are optional dependencies (and should be listed in setup.py
under analysis). (See also #1159 for a discussion.)
A user who does not have a specific optional package installed must still be able to import everything else in MDAnalysis. An analysis module may simply raise an ImportError
if a package is missing but it is recommended that it should print and log an error message notifying the user that a specific additional package needs to be installed to use this module.
If a large portion of the code in the module does not depend on a specific optional module then you should guard the import at the top level with a try/except
, print and log a warning, and only raise an ImportError
in the specific function or method that would depend on the missing module.
The source code is distributed over the package
directory (the library and documentation) and the testsuite
(test code and data files). Commits can contain code in both directories, e.g. a new feature and a test case can be committed together.
MDAnalysis contains compiled code (cython, C, C++) in addition to pure Python. With #444 we agreed on the following layout:
Place all source files for compiled shared object files into the same directory as the final shared object file.
*.pyx
files and cython-generated *.c
would be in the same directory as the *.so
. External dependent C/C++/Fortran libraries are in dedicated src
and include
folders. See the following tree as an example.
MDAnalysis
|--lib
| |-- _distances.so
| |-- distances.pyx
| |-- distances.c
|-- coordinates
|-- _dcdmodule.so
|-- src
|-- dcd.c
|-- include
|-- dcd.h
This is standard. See numpy, scipy, scikit-learn, mdtraj, for other examples.
We use git for version control. The DevelopmentWorkflow uses pull requests against the DevelopmentBranch (named develop) followed by automated testing and code review by project members. Successful PRs are merged into develop.
Follow git commits.
- Tests are mandatory for all new contributions.
- Functional tests of individual methods of classes or functions are preferred but "integration tests" that test most of the functionality in a single test are also acceptable.
- We strive for test coverage > 90% — check the coverage of testing when the PR is automatically tested.
- Try to test exceptions (i.e. that your code fails in predictable ways, see #597).
- Tests must follow the Style Guide: Tests
See Writing Tests (but that needs to be cleaned up) on more background and details on how to structure tests and how to include data files.
The core (everything except MDAnalysis.analysis
and MDAnalysis.visualization
) is critical and special scrutiny is applied to all changes and additions to the core. Good tests are absolutely required and code will not be merged unless extensive and comprehensive tests are also provided.
#743 outlines the testing requirements for analysis code:
- New code for
MDAnalysis.analysis
must come with unit tests. All old tests and the new tests must pass before code is merged into develop. - Tests for analysis classes and functions should at a minimum perform regression tests, i.e., run on input and compare to values generated when the code was added so that we know when the output changes in the future. (Even better are tests that test for absolute correctness of results but regression tests are the minimum requirement.)
Any code in MDAnalysis.analysis
that does not have substantial testing (at least 70% coverage) will be moved to a special MDAnalysis.analysis.legacy
module (by release 1.0.0) that will come with its own warning that this is essentially unmaintained functionality that is still provided because there's no alternative. Legacy packages that receive sufficient upgrades in testing can come back to the normal MDAnalysis.analysis
name space.
No consensus has emerged yet how to best test visualization code. At least minimal tests that run the code are typically requested.