bump `monty` to use the latest `monty.json` import speedup patch, add import test regression test, lazy load some rarely used but costly modules #4128

DanielYang59 · 2024-10-22T03:52:11Z

Summary

Organize optional dependencies in pyproject.toml to avoid duplicate
Bump sympy >= 1.3 to resolve bump networkx to 2.7 to fix intermittent CI failure when torch install failed: ImportError: cannot import name 'Mapping' from 'collections' #4116 (comment)
Avoid from numpy.testing import assert_allclose in production code

bump monty to use the latest json import speedup patch, partially fix import monty.json slowing down core import #3793
Have a quick look at other important modules (on cover those import other 3rd-party/non-core-pmg modules in this PR) and add import time test (profile: python -X importtime -c "from pymatgen.core.structure import Structure" 2> pmg.log && tuna pmg.log):
- core.bonds:
- core.composition:
- core.interface (need attention)
- core.ion:
- core.lattice:
- core.operations:
- core.periodic_table:
- core.sites:
- core.spectrum:
- core.structure
- core.surface:
- core.tensors:
- core.trajectory (AseAtomsAdaptor lazy imported)
- io.vasp.inputs
- io.vasp.outputs:
[For a follow up PR] looks like scipy across core need special attention

pyproject.toml

This reverts commit 9688244.

DanielYang59 · 2024-10-22T10:57:53Z

@mkhorton Can I lazy import sympy (it's used by only one method symmetry.settings.JonesFaithfulTransformation.parse_transformation_string and would give us a ~15% speed up on import core.Structure)?

This reverts commit 75e23b1.

DanielYang59 · 2024-10-26T05:13:00Z

src/pymatgen/core/interface.py

@@ -2784,10 +2783,16 @@ def from_slabs(
            substrate_slab = substrate_slab.get_orthogonal_c_slab()
        if isinstance(film_slab, Slab):
            film_slab = film_slab.get_orthogonal_c_slab()
-        assert_allclose(film_slab.lattice.alpha, 90, 0.1)


math.isclose is much faster (~ 500x/ ~ 188x) for comparing scalar than assert_allclose or np.isclose.

import math import numpy as np from numpy.testing import assert_allclose import timeit def run_tests(): a = 0.123456789 b = 0.123456789 rtol = 1e-5 atol = 1e-8 assert math.isclose(a, b, rel_tol=rtol, abs_tol=atol) assert np.isclose(a, b, rtol=rtol, atol=atol) assert_allclose(a, b, rtol=rtol, atol=atol) # Measure performance in milliseconds num_runs = 1000000 isclose_time_math = timeit.timeit(lambda: math.isclose(a, b, rel_tol=rtol, abs_tol=atol), number=num_runs) * 1000 isclose_time_np = timeit.timeit(lambda: np.isclose(a, b, rtol=rtol, atol=atol), number=num_runs) * 1000 assert_allclose_time = timeit.timeit(lambda: assert_allclose(a, b, rtol=rtol, atol=atol), number=num_runs) * 1000 print(f"\nPerformance results over {num_runs} runs with single float values:") print(f"math.isclose time: {isclose_time_math:.3f} ms") print(f"np.isclose time: {isclose_time_np:.3f} ms") print(f"assert_allclose time: {assert_allclose_time:.3f} ms") print(f"Speedup (math.isclose vs np.isclose): {isclose_time_np / isclose_time_math:.2f}x") print(f"Speedup (math.isclose vs assert_allclose): {assert_allclose_time / isclose_time_math:.2f}x") run_tests()

Gives:

Performance results over 1000000 runs with single float values: math.isclose time: 37.568 ms np.isclose time: 7065.625 ms assert_allclose time: 19688.410 ms Speedup (math.isclose vs np.isclose): 188.07x Speedup (math.isclose vs assert_allclose): 524.07x

DanielYang59 · 2024-10-26T05:25:37Z

src/pymatgen/analysis/interfaces/coherent_interfaces.py

@@ -101,21 +101,15 @@ def _find_matches(self) -> None:
        for match in self.zsl_matches:
            xform = get_2d_transform(film_vectors, match.film_vectors)
            strain, _rot = polar(xform)
-            assert_allclose(


Similarly, np.testing.assert_allclose is not suitable for production code (~ 2.5x slower than np.allclose), though it provides more detailed debug info for testing.

Note the default tolerances are a bit stricter for assert_allclose, this commit also migrated the tolerances as is without changing behaviour.

testing.assert_allclose(actual, desired, rtol=1e-07, atol=0, equal_nan=True, err_msg='', verbose=True, *, strict=False)

numpy.allclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False)

Test script:

import numpy as np from numpy.testing import assert_allclose import timeit def run_tests(): # Array sizes to test sizes = [(10, 10), (100, 100), (1000, 1000), (10000, 10000)] rtol = 1e-5 atol = 1e-8 # Run tests for each size for size in sizes: arr1 = np.random.rand(*size) arr2 = np.copy(arr1) # Exact copy of arr1 assert_allclose(arr1, arr2, rtol=rtol, atol=atol) assert np.allclose(arr1, arr2, rtol=rtol, atol=atol) # Measure performance in milliseconds num_runs = 10 assert_time = timeit.timeit(lambda: assert_allclose(arr1, arr2, rtol=rtol, atol=atol), number=num_runs) * 1000 allclose_time = timeit.timeit(lambda: np.allclose(arr1, arr2, rtol=rtol, atol=atol), number=num_runs) * 1000 print(f"\nPerformance results for array size {size} over {num_runs} runs:") print(f"assert_allclose time: {assert_time:.3f} ms") print(f"np.allclose time: {allclose_time:.3f} ms") print(f"Speedup: {assert_time / allclose_time:.2f}x") run_tests()

I got:

Performance results for array size (10, 10) over 10 runs: assert_allclose time: 0.308 ms np.allclose time: 0.102 ms Speedup: 3.03x Performance results for array size (100, 100) over 10 runs: assert_allclose time: 0.563 ms np.allclose time: 0.244 ms Speedup: 2.30x Performance results for array size (1000, 1000) over 10 runs: assert_allclose time: 55.887 ms np.allclose time: 22.179 ms Speedup: 2.52x Performance results for array size (10000, 10000) over 10 runs: assert_allclose time: 12374.578 ms np.allclose time: 2916.093 ms Speedup: 4.24x

DanielYang59 · 2024-10-26T06:13:30Z

src/pymatgen/core/trajectory.py

@@ -580,9 +579,12 @@ def from_file(cls, filename: str | Path, constant_lattice: bool = True, **kwargs
            try:
                from ase.io.trajectory import Trajectory as AseTrajectory

+                from pymatgen.io.ase import AseAtomsAdaptor


AseAtomsAdaptor is only used in one of the many try-except branches (other branches also lazy import the corresponding modules), ~10% speed up.

pymatgen/src/pymatgen/core/trajectory.py

Lines 569 to 593 in 6992aee

if fnmatch(filename, "*XDATCAR*"):

from pymatgen.io.vasp.outputs import Xdatcar

structures = Xdatcar(filename).structures

elif fnmatch(filename, "vasprun*.xml*"):

from pymatgen.io.vasp.outputs import Vasprun

structures = Vasprun(filename).structures

elif fnmatch(filename, "*.traj"):

try:

from ase.io.trajectory import Trajectory as AseTrajectory

ase_traj = AseTrajectory(filename)

# Periodic boundary conditions should be the same for all frames so just check the first

pbc = ase_traj[0].pbc

if any(pbc):

structures = [AseAtomsAdaptor.get_structure(atoms) for atoms in ase_traj]

else:

molecules = [AseAtomsAdaptor.get_molecule(atoms) for atoms in ase_traj]

is_mol = True

except ImportError as exc:

raise ImportError("ASE is required to read .traj files. pip install ase") from exc

DanielYang59 · 2024-10-26T06:29:39Z

src/pymatgen/analysis/adsorption.py

@@ -664,6 +662,10 @@ def plot_slab(
        decay (float): how the alpha-value decays along the z-axis
        inverse (bool): invert z axis to plot opposite surface
    """
+    # Expensive import (PR4128)
+    from matplotlib import patches


matplotlib is almost certainly not a core module for pymatgen.core, and it incurred significant import overhead to core.interface:

After:

…luctuation

DanielYang59 · 2024-10-26T10:28:07Z

@mkhorton I believe this PR is already for view, let me know if you have any comment, thank you!

Credit to @janosh for the motivation and helpful discussion!

copy pyproject from 4073

d2447ed

DanielYang59 changed the title ~~bump monty to use the latest json import speedup patch~~ bump monty to use the latest monty.json import speedup patch Oct 22, 2024

bump monty

1eb855b

DanielYang59 force-pushed the bump-monty-json branch from 368bc69 to 1eb855b Compare October 22, 2024 04:00

DanielYang59 mentioned this pull request Oct 22, 2024

bump networkx to 2.7 to fix intermittent CI failure when torch install failed: ImportError: cannot import name 'Mapping' from 'collections' #4116

Merged

recover networkx pin, and bump sympy

aae9f4b

This comment was marked as resolved.

Sign in to view

pin monty to lower ver to see what is causing the failure

e08454e

DanielYang59 force-pushed the bump-monty-json branch 4 times, most recently from 5489dff to 8d9d9a6 Compare October 22, 2024 06:54

revert all changes to pyproject but monty

f8b5723

DanielYang59 force-pushed the bump-monty-json branch from 8d9d9a6 to f8b5723 Compare October 22, 2024 06:54

bump sympy

1948442

DanielYang59 force-pushed the bump-monty-json branch from ee3c9d4 to 1948442 Compare October 22, 2024 08:02

sort and group optional deps

03df9b0

DanielYang59 commented Oct 22, 2024

View reviewed changes

pyproject.toml Show resolved Hide resolved

DanielYang59 added 4 commits October 22, 2024 16:28

loose networkx pin for compatibility

4492816

lazy import sympy

a123174

lazy load scipy

9688244

Revert "lazy load scipy"

6e0a89c

This reverts commit 9688244.

DanielYang59 added 2 commits October 22, 2024 19:03

try netcdf4 install with delvewheel, materialsproject@1dfc9e4

75e23b1

Revert "try netcdf4 install with delvewheel, materialsproject@1dfc9e4"

4320868

This reverts commit 75e23b1.

DanielYang59 mentioned this pull request Oct 22, 2024

build against NPY2 #3894

Merged

4 tasks

shyuep and others added 4 commits October 22, 2024 18:29

Merge branch 'master' into bump-monty-json

77ebf2a

test netcdf4 1.7.1.post2, notice new release is out today

206ea52

netcdf4 1.7.1.post2 doesn't work, try latest 1.7.2

25975dd

reset netcdf4 pin

762b85a

replace assert_allclose for scalar compare

fa17222

DanielYang59 commented Oct 26, 2024

View reviewed changes

replace assert_allclose with isclose

74ec0c9

DanielYang59 commented Oct 26, 2024

View reviewed changes

fix is close

df8e2ff

DanielYang59 force-pushed the bump-monty-json branch from 34f4461 to df8e2ff Compare October 26, 2024 05:41

DanielYang59 added 3 commits October 26, 2024 13:43

use np.allclose over np.all(np.isclose())

2778e28

lazy import AseAtomsAdaptor

8952c4a

guard type check import

c37ba45

DanielYang59 commented Oct 26, 2024

View reviewed changes

lazy import matplotlib

1674a96

DanielYang59 commented Oct 26, 2024

View reviewed changes

toggle reference generation

8ac0c7f

DanielYang59 changed the title ~~bump monty to use the latest monty.json import speedup patch~~ bump monty to use the latest monty.json import speedup patch, add import test regression test, lazy load some rarely used but costly modules Oct 26, 2024

update import time records

2f7f993

DanielYang59 force-pushed the bump-monty-json branch from 1115ef3 to 2f7f993 Compare October 26, 2024 08:57

skip in not CI env

abf5f8b

DanielYang59 force-pushed the bump-monty-json branch from 4e21061 to abf5f8b Compare October 26, 2024 09:07

DanielYang59 added 2 commits October 26, 2024 17:10

include current time in err msg

de4dc3b

loose hard threshold to 100%, as it appear macos runner is prone to f…

8f5c2ac

…luctuation

DanielYang59 force-pushed the bump-monty-json branch from 8ca9cc7 to 8f5c2ac Compare October 26, 2024 09:17

DanielYang59 added 4 commits October 26, 2024 17:25

update type

4849da0

skip test for macos

aceb5b5

add PR tag for easy tracing

eb51fee

use perf_counter_ns for precision

c64ecae

DanielYang59 marked this pull request as ready for review October 26, 2024 10:27

DanielYang59 requested review from shyuep and mkhorton as code owners October 26, 2024 10:27

Merge branch 'master' into bump-monty-json

54752f7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bump `monty` to use the latest `monty.json` import speedup patch, add import test regression test, lazy load some rarely used but costly modules #4128

bump `monty` to use the latest `monty.json` import speedup patch, add import test regression test, lazy load some rarely used but costly modules #4128

DanielYang59 commented Oct 22, 2024 •

edited

Loading

This comment was marked as resolved.

DanielYang59 commented Oct 22, 2024 •

edited

Loading

DanielYang59 Oct 26, 2024 •

edited

Loading

DanielYang59 Oct 26, 2024 •

edited

Loading

DanielYang59 Oct 26, 2024 •

edited

Loading

DanielYang59 Oct 26, 2024

DanielYang59 commented Oct 26, 2024 •

edited

Loading

	if fnmatch(filename, "XDATCAR"):
	from pymatgen.io.vasp.outputs import Xdatcar

	structures = Xdatcar(filename).structures

	elif fnmatch(filename, "vasprun.xml"):
	from pymatgen.io.vasp.outputs import Vasprun

	structures = Vasprun(filename).structures

	elif fnmatch(filename, "*.traj"):
	try:
	from ase.io.trajectory import Trajectory as AseTrajectory

	ase_traj = AseTrajectory(filename)
	# Periodic boundary conditions should be the same for all frames so just check the first
	pbc = ase_traj[0].pbc
	if any(pbc):
	structures = [AseAtomsAdaptor.get_structure(atoms) for atoms in ase_traj]
	else:
	molecules = [AseAtomsAdaptor.get_molecule(atoms) for atoms in ase_traj]
	is_mol = True

	except ImportError as exc:
	raise ImportError("ASE is required to read .traj files. pip install ase") from exc

bump monty to use the latest monty.json import speedup patch, add import test regression test, lazy load some rarely used but costly modules #4128

Are you sure you want to change the base?

bump monty to use the latest monty.json import speedup patch, add import test regression test, lazy load some rarely used but costly modules #4128

Conversation

DanielYang59 commented Oct 22, 2024 • edited Loading

Summary

This comment was marked as resolved.

DanielYang59 commented Oct 22, 2024 • edited Loading

DanielYang59 Oct 26, 2024 • edited Loading

Choose a reason for hiding this comment

DanielYang59 Oct 26, 2024 • edited Loading

Choose a reason for hiding this comment

DanielYang59 Oct 26, 2024 • edited Loading

Choose a reason for hiding this comment

DanielYang59 Oct 26, 2024

Choose a reason for hiding this comment

DanielYang59 commented Oct 26, 2024 • edited Loading

bump `monty` to use the latest `monty.json` import speedup patch, add import test regression test, lazy load some rarely used but costly modules #4128

bump `monty` to use the latest `monty.json` import speedup patch, add import test regression test, lazy load some rarely used but costly modules #4128

DanielYang59 commented Oct 22, 2024 •

edited

Loading

DanielYang59 commented Oct 22, 2024 •

edited

Loading

DanielYang59 Oct 26, 2024 •

edited

Loading

DanielYang59 Oct 26, 2024 •

edited

Loading

DanielYang59 Oct 26, 2024 •

edited

Loading

DanielYang59 commented Oct 26, 2024 •

edited

Loading