-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bump monty
to use the latest monty.json
import speedup patch, add import test regression test, lazy load some rarely used but costly modules
#4128
base: master
Are you sure you want to change the base?
Conversation
monty
to use the latest json import speedup patchmonty
to use the latest monty.json
import speedup patch
368bc69
to
1eb855b
Compare
This comment was marked as resolved.
This comment was marked as resolved.
5489dff
to
8d9d9a6
Compare
8d9d9a6
to
f8b5723
Compare
ee3c9d4
to
1948442
Compare
@mkhorton Can I lazy import |
@@ -2784,10 +2783,16 @@ def from_slabs( | |||
substrate_slab = substrate_slab.get_orthogonal_c_slab() | |||
if isinstance(film_slab, Slab): | |||
film_slab = film_slab.get_orthogonal_c_slab() | |||
assert_allclose(film_slab.lattice.alpha, 90, 0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
math.isclose
is much faster (~ 500x/ ~ 188x) for comparing scalar than assert_allclose
or np.isclose
.
import math
import numpy as np
from numpy.testing import assert_allclose
import timeit
def run_tests():
a = 0.123456789
b = 0.123456789
rtol = 1e-5
atol = 1e-8
assert math.isclose(a, b, rel_tol=rtol, abs_tol=atol)
assert np.isclose(a, b, rtol=rtol, atol=atol)
assert_allclose(a, b, rtol=rtol, atol=atol)
# Measure performance in milliseconds
num_runs = 1000000
isclose_time_math = timeit.timeit(lambda: math.isclose(a, b, rel_tol=rtol, abs_tol=atol), number=num_runs) * 1000
isclose_time_np = timeit.timeit(lambda: np.isclose(a, b, rtol=rtol, atol=atol), number=num_runs) * 1000
assert_allclose_time = timeit.timeit(lambda: assert_allclose(a, b, rtol=rtol, atol=atol), number=num_runs) * 1000
print(f"\nPerformance results over {num_runs} runs with single float values:")
print(f"math.isclose time: {isclose_time_math:.3f} ms")
print(f"np.isclose time: {isclose_time_np:.3f} ms")
print(f"assert_allclose time: {assert_allclose_time:.3f} ms")
print(f"Speedup (math.isclose vs np.isclose): {isclose_time_np / isclose_time_math:.2f}x")
print(f"Speedup (math.isclose vs assert_allclose): {assert_allclose_time / isclose_time_math:.2f}x")
run_tests()
Gives:
Performance results over 1000000 runs with single float values:
math.isclose time: 37.568 ms
np.isclose time: 7065.625 ms
assert_allclose time: 19688.410 ms
Speedup (math.isclose vs np.isclose): 188.07x
Speedup (math.isclose vs assert_allclose): 524.07x
@@ -101,21 +101,15 @@ def _find_matches(self) -> None: | |||
for match in self.zsl_matches: | |||
xform = get_2d_transform(film_vectors, match.film_vectors) | |||
strain, _rot = polar(xform) | |||
assert_allclose( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, np.testing.assert_allclose
is not suitable for production code (~ 2.5x slower than np.allclose
), though it provides more detailed debug info for testing.
Note the default tolerances are a bit stricter for assert_allclose
, this commit also migrated the tolerances as is without changing behaviour.
testing.assert_allclose(actual, desired, rtol=1e-07, atol=0, equal_nan=True, err_msg='', verbose=True, *, strict=False)
numpy.allclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False)
Test script:
import numpy as np
from numpy.testing import assert_allclose
import timeit
def run_tests():
# Array sizes to test
sizes = [(10, 10), (100, 100), (1000, 1000), (10000, 10000)]
rtol = 1e-5
atol = 1e-8
# Run tests for each size
for size in sizes:
arr1 = np.random.rand(*size)
arr2 = np.copy(arr1) # Exact copy of arr1
assert_allclose(arr1, arr2, rtol=rtol, atol=atol)
assert np.allclose(arr1, arr2, rtol=rtol, atol=atol)
# Measure performance in milliseconds
num_runs = 10
assert_time = timeit.timeit(lambda: assert_allclose(arr1, arr2, rtol=rtol, atol=atol), number=num_runs) * 1000
allclose_time = timeit.timeit(lambda: np.allclose(arr1, arr2, rtol=rtol, atol=atol), number=num_runs) * 1000
print(f"\nPerformance results for array size {size} over {num_runs} runs:")
print(f"assert_allclose time: {assert_time:.3f} ms")
print(f"np.allclose time: {allclose_time:.3f} ms")
print(f"Speedup: {assert_time / allclose_time:.2f}x")
run_tests()
I got:
Performance results for array size (10, 10) over 10 runs:
assert_allclose time: 0.308 ms
np.allclose time: 0.102 ms
Speedup: 3.03x
Performance results for array size (100, 100) over 10 runs:
assert_allclose time: 0.563 ms
np.allclose time: 0.244 ms
Speedup: 2.30x
Performance results for array size (1000, 1000) over 10 runs:
assert_allclose time: 55.887 ms
np.allclose time: 22.179 ms
Speedup: 2.52x
Performance results for array size (10000, 10000) over 10 runs:
assert_allclose time: 12374.578 ms
np.allclose time: 2916.093 ms
Speedup: 4.24x
34f4461
to
df8e2ff
Compare
@@ -580,9 +579,12 @@ def from_file(cls, filename: str | Path, constant_lattice: bool = True, **kwargs | |||
try: | |||
from ase.io.trajectory import Trajectory as AseTrajectory | |||
|
|||
from pymatgen.io.ase import AseAtomsAdaptor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AseAtomsAdaptor
is only used in one of the many try-except branches (other branches also lazy import the corresponding modules), ~10% speed up.
pymatgen/src/pymatgen/core/trajectory.py
Lines 569 to 593 in 6992aee
if fnmatch(filename, "*XDATCAR*"): | |
from pymatgen.io.vasp.outputs import Xdatcar | |
structures = Xdatcar(filename).structures | |
elif fnmatch(filename, "vasprun*.xml*"): | |
from pymatgen.io.vasp.outputs import Vasprun | |
structures = Vasprun(filename).structures | |
elif fnmatch(filename, "*.traj"): | |
try: | |
from ase.io.trajectory import Trajectory as AseTrajectory | |
ase_traj = AseTrajectory(filename) | |
# Periodic boundary conditions should be the same for all frames so just check the first | |
pbc = ase_traj[0].pbc | |
if any(pbc): | |
structures = [AseAtomsAdaptor.get_structure(atoms) for atoms in ase_traj] | |
else: | |
molecules = [AseAtomsAdaptor.get_molecule(atoms) for atoms in ase_traj] | |
is_mol = True | |
except ImportError as exc: | |
raise ImportError("ASE is required to read .traj files. pip install ase") from exc |
@@ -664,6 +662,10 @@ def plot_slab( | |||
decay (float): how the alpha-value decays along the z-axis | |||
inverse (bool): invert z axis to plot opposite surface | |||
""" | |||
# Expensive import (PR4128) | |||
from matplotlib import patches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
monty
to use the latest monty.json
import speedup patchmonty
to use the latest monty.json
import speedup patch, add import test regression test, lazy load some rarely used but costly modules
1115ef3
to
2f7f993
Compare
4e21061
to
abf5f8b
Compare
8ca9cc7
to
8f5c2ac
Compare
Summary
pyproject.toml
to avoid duplicatesympy >= 1.3
to resolve bumpnetworkx
to 2.7 to fix intermittent CI failure whentorch
install failed:ImportError: cannot import name 'Mapping' from 'collections'
#4116 (comment)from numpy.testing import assert_allclose
in production codebump
monty
to use the latest json import speedup patch, partially fix importmonty.json
slowing down core import #3793Have a quick look at other important modules (on cover those import other 3rd-party/non-core-pmg modules in this PR) and add import time test (profile:
python -X importtime -c "from pymatgen.core.structure import Structure" 2> pmg.log && tuna pmg.log
):AseAtomsAdaptor
lazy imported)[For a follow up PR] looks like
scipy
acrosscore
need special attention