-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download crystal space optional dependencies in ci #331
Conversation
WalkthroughThe pull request modifies the CI workflow configuration in the Changes
Possibly related PRs
Suggested labels
Poem
Warning Rate limit exceeded@AntObi has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 16 minutes and 53 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
.github/workflows/ci.yml (1)
31-35
: Consider adding pip caching to speed up CI.To improve CI performance, consider adding pip caching. Also, ensure your package's Python version constraints match the CI matrix (3.10-3.12).
Add this before the pip install steps:
- name: Install dependencies run: | + python -m pip cache dir + - uses: actions/cache@v3 + with: + path: ~/.cache/pip + key: ${{ runner.os }}-pip-${{ hashFiles('**/setup.py') }} + restore-keys: | + ${{ runner.os }}-pip- python -m pip install --upgrade pip wheel setuptools pip install -e ".[crystal_space,dev]"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
.github/workflows/ci.yml
(1 hunks)
🔇 Additional comments (1)
.github/workflows/ci.yml (1)
34-34
: LGTM! The dependency installation looks correct.
The syntax for installing optional dependencies is accurate. Including both crystal_space
and dev
extras ensures that all required dependencies for testing are available in the CI environment.
Let's verify that these extras are properly defined in the package setup:
✅ Verification successful
The crystal_space dependencies are properly defined and used in tests
The verification confirms that:
- The
crystal_space
optional dependencies are correctly defined inpyproject.toml
- The dependencies are actively used in
smact/tests/test_utils.py
- Both
crystal_space
and its dependencymp-api
are referenced in the test suite
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the crystal_space and dev extras are defined in setup.py or pyproject.toml
# Check for extras in setup.py
echo "Checking setup.py for extras..."
rg -A 5 'extras_require.*=.*{' setup.py
# Check for extras in pyproject.toml
echo "Checking pyproject.toml for extras..."
rg -A 5 '\[project.optional-dependencies\]' pyproject.toml
# Check if crystal_space is imported in tests
echo "Verifying crystal_space usage in tests..."
rg -l "import.*crystal.*space" "tests/"
Length of output: 642
Script:
#!/bin/bash
# Let's fix the regex and search for extras in setup.py
echo "Checking setup.py for extras..."
rg "extras_require\s*=" setup.py -A 10
# Check for crystal_space dependencies
echo "Checking crystal_space dependencies..."
rg "crystal.?space" pyproject.toml
# Check test directory location
echo "Looking for test directory..."
fd -t d "test|tests"
Length of output: 369
Script:
#!/bin/bash
# Check crystal_space usage in the correct test directory
echo "Checking crystal_space usage in tests..."
rg -l "crystal.?space" "smact/tests/"
# Get more context around crystal_space dependencies
echo "Getting full crystal_space dependencies..."
rg -A 5 "crystal_space.*=" pyproject.toml
# Check if mp-api is used in tests
echo "Checking mp-api usage in tests..."
rg -l "mp.?api" "smact/tests/"
Length of output: 532
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
pyproject.toml
(4 hunks)
🔇 Additional comments (3)
pyproject.toml (3)
89-89
: LGTM: Ruff target version update
The Ruff target version update to py310 correctly aligns with the new minimum Python requirement.
8-8
: Verify if version bump follows semantic versioning
The version increase from 2.6 to 2.8 skips 2.7. Please ensure this aligns with semantic versioning principles and that the changelog reflects the significant changes warranting two minor version increments.
46-46
: Consider maintaining an upper bound for pymatgen dependency
Removing the upper bound constraint (<2024.8.8
) for pymatgen could lead to compatibility issues if breaking changes are introduced in future versions. Consider maintaining an upper bound to ensure stability.
@@ -5,7 +5,7 @@ build-backend = "setuptools.build_meta" | |||
|
|||
[project] | |||
name = "SMACT" | |||
version = "2.6" | |||
version = "2.8" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove Python 3.9 classifier as it conflicts with requires-python
The requires-python = ">=3.10"
specification conflicts with the Python 3.9 classifier. This could mislead users about version compatibility.
- "Programming Language :: Python :: 3.9",
Also applies to: 30-30, 24-24
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Outside diff range and nitpick comments (8)
smact/builder.py (1)
Line range hint
46-83
: Consider a more robust approach to structure validationThe addition of
strict=False
across multiple files appears to be addressing symptoms rather than the root cause. Given this PR's focus on optional dependencies for testing, consider:
- Implementing a validation layer for crystal structures
- Adding proper test fixtures with optional dependencies
- Creating structure-specific oxidation state validators
This would provide better error handling and maintain chemical validity whilst supporting the testing improvements mentioned in the PR objectives.
smact/utils/composition.py (1)
Line range hint
66-77
: Consider adding validation and logging for mismatched sequencesWhile making the zip operations non-strict provides more flexibility, it might mask data quality issues. Consider:
- Adding input validation to log warnings when sequences are mismatched
- Implementing a validation step to ensure generated chemical formulas are valid
- Adding documentation to explain why strict=False is necessary and its implications
smact/tests/test_doper.py (1)
39-40
: Consider adding a comment explaining the unequal lengths scenarioIt would be helpful to add a brief comment explaining why
strict=False
is necessary here. This would help future maintainers understand that substitution lists can have different lengths and that this is expected behaviour.+# Substitution lists may have different lengths depending on available dopants for n_atom, p_atom in zip(n_sub_list_cat, p_sub_list_cat, strict=False):
Also applies to: 43-44
smact/utils/crystal_space/generate_composition_with_smact.py (1)
35-37
: Consider adding debug logging for length mismatchesWhilst adding
strict=False
makes the code more resilient, silently handling mismatched lengths between symbols and counts could mask potential issues. Consider adding debug logging when lengths differ to help identify unexpected scenarios.+ if len(symbols) != len(counts): + logging.debug(f"Length mismatch in convert_formula: symbols={len(symbols)}, counts={len(counts)}") formula_dict = dict(zip(symbols, counts, strict=False))smact/structure_prediction/mutation.py (1)
Line range hint
37-38
: Type hints foralpha
parameter look goodThe addition of
Callable[[str, str], float] | None
type hint improves type safety and development-time error catching. Consider enhancing the docstring to include the type information:- alpha: A function to call to fill in missing lambda values. + alpha: Optional callable (str, str) -> float to fill in missing lambda values. The function must take the two species' strings as arguments, and return a floating point lambda value.Also applies to: 81-82
smact/dopant_prediction/doper.py (1)
Line range hint
94-101
: Add input validation to ensure data integrityConsider adding input validation to ensure all input lists have the expected length of 4 items before performing the merge operation. This would help catch potential issues early and make the code more robust.
Here's a suggested implementation:
def _merge_dicts(self, keys, dopants_list, groupby_list): + expected_length = 4 # Number of dopant substitution types + if not all(len(lst) == expected_length for lst in [keys, dopants_list, groupby_list]): + raise ValueError( + f"All input lists must have length {expected_length}. " + f"Got lengths: keys={len(keys)}, dopants={len(dopants_list)}, " + f"groupby={len(groupby_list)}" + ) merged_dict = dict() - for k, dopants, group in zip(keys, dopants_list, groupby_list, strict=False): + for k, dopants, group in zip(keys, dopants_list, groupby_list): merged_values = dict() merged_values["sorted"] = dopantssmact/tests/test_structure.py (1)
561-561
: Consider adding tolerance parameter for probability comparisonsThe addition of
strict=False
to thezip
call in probability comparisons is appropriate. However, the test might be brittle due to direct floating-point comparisons.Consider adding a tolerance parameter for probability comparisons:
def assertProbabilityAlmostEqual(self, p1, p2, places=7): """Assert that two probabilities are almost equal.""" self.assertAlmostEqual(p1, p2, places=places)smact/structure_prediction/structure.py (1)
Line range hint
218-421
: Consider implementing a validation layerThe introduction of
strict=False
in multiple locations suggests a need for a more robust validation layer. Consider implementing a dedicated validation module to ensure data consistency across the codebase.Recommendations:
- Create a dedicated validation module for structure data.
- Implement comprehensive validation checks for both pymatgen and POSCAR inputs.
- Add logging for cases where data mismatches occur.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (10)
smact/builder.py
(2 hunks)smact/data_loader.py
(1 hunks)smact/dopant_prediction/doper.py
(1 hunks)smact/screening.py
(5 hunks)smact/structure_prediction/mutation.py
(2 hunks)smact/structure_prediction/structure.py
(2 hunks)smact/tests/test_doper.py
(1 hunks)smact/tests/test_structure.py
(5 hunks)smact/utils/composition.py
(2 hunks)smact/utils/crystal_space/generate_composition_with_smact.py
(2 hunks)
🔇 Additional comments (14)
smact/builder.py (1)
46-46
: Verify crystal site and oxidation state mapping
Whilst adding strict=False
prevents crashes when handling mismatched sequences, it could potentially mask structural issues if the number of crystal sites doesn't match the expected oxidation states for a cubic perovskite (which should have exactly 5 sites).
Let's verify the crystal site generation:
✅ Verification successful
Using strict=False
is appropriate for this implementation
The implementation is correct and safe to use strict=False
here because:
- The crystal structure is explicitly defined with a fixed basis of 3 positions using space group 221 (Pm-3m, cubic perovskite)
- The oxidation states list is hardcoded as
[[2]] + [[4]] + [[-2]] * 3
, which matches the expected ABO₃ perovskite structure - The ASE crystal generator ensures the correct number of sites based on the specified space group and basis positions
- The
strict=False
parameter provides flexibility for handling supercell repetitions without breaking the oxidation state mapping
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for tests or documentation that verify the number of sites in cubic perovskites
rg -g '!*.pyc' -g '!*{.json,.txt}' "cubic_perovskite.*test|test.*cubic_perovskite" -A 5
Length of output: 573
Script:
#!/bin/bash
# Let's examine the implementation of cubic_perovskite to understand site generation
ast-grep --pattern 'def cubic_perovskite($$$)'
# Also check for any test files
fd "test.*\.py" --exec rg -l "cubic_perovskite"
Length of output: 123
Script:
#!/bin/bash
# Let's examine the test file and the builder implementation
cat smact/tests/test_core.py | rg -A 10 "cubic_perovskite"
# Also check the actual implementation
cat smact/builder.py | rg -A 20 "def cubic_perovskite"
Length of output: 1378
Script:
#!/bin/bash
# Let's see the complete implementation of cubic_perovskite to understand site generation
cat smact/builder.py | rg -B 5 -A 30 "def cubic_perovskite"
# Also check if there are any other tests that might verify the structure
rg -g "*.py" "crystal\(" -A 5
Length of output: 1789
smact/utils/composition.py (2)
66-69
: Verify handling of mismatched element-amount pairs
The addition of strict=False
allows the zip operation to continue even when the number of elements doesn't match the number of amounts. This could lead to truncated chemical formulas. Please verify this is the intended behaviour.
✅ Verification successful
Let me gather more context about the test cases and mock data to understand the expected behavior.
Based on the test cases and implementation, I can now provide a final response:
The strict=False
parameter is correctly implemented for handling valid chemical formulas
The test cases demonstrate that the comp_maker
function correctly handles both 2-tuple and 3-tuple inputs, producing valid chemical formulas like "FeO" and "Fe3O4". The non-strict zipping is intentional and necessary because:
- For 2-tuple inputs (elements and amounts), the test case
mock_filter_output[1]
confirms correct handling of("Fe", "O"), (1, 1)
- The function successfully creates Pymatgen Composition objects that match expected chemical formulas
- All test assertions pass, validating both simple (FeO) and complex (Li10Ge(PS6)2) formulas
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for test cases that verify the handling of mismatched sequences
rg -l "test.*comp_maker" | xargs rg "assert.*comp_maker.*" -A 5
# Look for actual usage patterns to understand typical input shapes
ast-grep --pattern 'comp_maker($$$)'
Length of output: 530
Script:
#!/bin/bash
# Check the test setup and mock data
rg -A 10 "mock_filter_output" smact/tests/test_utils.py
# Look at the actual implementation of comp_maker
rg -A 20 "def comp_maker" smact/utils/composition.py
# Check if there are any related issues or PRs that discuss this change
rg -i "strict.*zip" smact/
Length of output: 2638
77-77
: Validate handling of mismatched element-oxidation-amount triplets
The strict=False
parameter allows zipping to continue with partial data when element, oxidation state, and amount sequences have different lengths. This could potentially create invalid chemical species.
✅ Verification successful
Let me gather more information about the usage and validation of these triplets.
Let me check the test setup and mock data to understand how mismatched triplets are handled in tests.
Using strict=False for zip() is intentional and safe in this context
The use of strict=False
in the comp_maker
function is actually appropriate and well-tested. The test cases show that the function correctly handles both two-tuple and three-tuple inputs from smact_filter
, where the tuples are guaranteed to have matching lengths by the smact_filter
implementation. The strict=False
parameter is used defensively to maintain compatibility with different output formats, as evidenced by the test cases:
- Two-tuple format:
(("Fe", "O"), (1, 1))
- Three-tuple format:
(("Fe", "Fe", "O"), (2, 3, -2), (1, 2, 4))
The test suite validates that both formats are handled correctly and produce valid chemical compositions.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for examples where these triplets are constructed
rg "smact_filter" -A 5 | rg -v "test"
# Look for any existing error handling around these operations
ast-grep --pattern 'try {
$$$
zip($$$)
$$$
} catch ($$$) {
$$$
}'
Length of output: 13166
Script:
#!/bin/bash
# Look for test cases that verify handling of mismatched triplets
rg "test.*comp_maker|test.*formula_maker" -A 10
# Check for any validation of input lengths before zipping
ast-grep --pattern 'if len($x) != len($y)'
# Look for any error handling around composition creation
rg "raise.*Error.*comp" -A 2
Length of output: 1816
Script:
#!/bin/bash
# Look at the test setup and mock data
rg "mock_filter_output.*=" -B 5 -A 5
# Check the actual implementation of smact_filter to see how it constructs these triplets
ast-grep --pattern 'def smact_filter($$$) {
$$$
}'
# Look for any validation in the composition creation
rg "def comp_maker" -A 20
Length of output: 7483
smact/tests/test_doper.py (1)
39-40
: LGTM! Good use of strict=False parameter
The addition of strict=False
to the zip operations makes the test more resilient by explicitly handling cases where the substitution lists have different lengths. This is a sensible approach since the test only needs to verify charge relationships for available pairs.
Also applies to: 43-44
smact/utils/crystal_space/generate_composition_with_smact.py (1)
123-126
: Verify test coverage for mismatched lengths
The addition of strict=False
suggests that mismatched lengths between res[0]
and res[2]
are possible. Please ensure there are test cases covering these scenarios.
smact/structure_prediction/mutation.py (1)
10-10
: LGTM! Import changes look good
The addition of Callable
type and the empty line after imports improve code organisation and type safety whilst following PEP 8 guidelines.
Also applies to: 19-19
smact/dopant_prediction/doper.py (1)
95-95
: Consider removing strict=False
from zip operation
The strict=False
parameter allows the zip operation to silently truncate when input lists have different lengths. However, in this context, all input lists (keys
, dopants_list
, groupby_list
) should always contain exactly 4 items as they represent different types of dopant substitutions. Silently truncating could mask potential bugs.
Let's verify the list lengths at the call site:
smact/screening.py (1)
250-250
: Same optimisation needed as in eneg_states_test
This line has the same inefficiencies as discussed in the eneg_states_test
function.
smact/tests/test_structure.py (4)
370-370
: LGTM: Lambda interface test cases are well-structured
The addition of strict=False
to the zip
call in test_lambda_interface
is appropriate as it allows for flexible testing of different permutations.
303-303
: Verify test coverage for edge cases
The addition of strict=False
to the zip
call in test_db_interface
suggests that test cases and expected results might have different lengths. This could indicate missing test coverage for certain scenarios.
✅ Verification successful
No issues with test coverage - strict=False
is used correctly
The strict=False
parameter in the zip
call is used appropriately here. Looking at the test file, we can see that test_with_species_args
and test_with_species_exp
are explicitly defined lists with matching lengths, containing 8 test cases each. The test cases comprehensively cover various scenarios including:
- Single species queries
- Multiple species queries
- Different ordering of species
- Invalid species combinations
- Complete composition matches
The strict=False
parameter is used defensively as a good programming practice, rather than indicating any mismatch in test coverage.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check the test coverage for get_with_species method
ast-grep --pattern 'def get_with_species($_, $_) {
$$$
}'
Length of output: 61
Script:
#!/bin/bash
# Let's first check the test file content to understand the test cases
cat smact/tests/test_structure.py
Length of output: 21393
312-313
: Consider validating MP data integrity
The addition of strict=False
to the zip
call handling Materials Project (MP) data suggests potential mismatches between MP IDs and structures. This could indicate data integrity issues.
Consider adding validation to ensure MP data integrity:
def test_mp_data_integrity(self):
"""Test MP data integrity."""
for data in mp_data:
self.assertIn("material_id", data)
self.assertIn("structure", data)
88-89
: Consider adding test cases for unequal length scenarios
The addition of strict=False
to the zip
calls in assertStructAlmostEqual
makes the comparison more lenient. Whilst this change prevents potential ValueError
exceptions, it might mask issues where structures have different numbers of sites.
Consider adding explicit test cases that verify the behaviour when structures have different numbers of sites:
def test_assertStructAlmostEqual_unequal_sites(self):
"""Test comparison of structures with different numbers of sites."""
s1 = SmactStructure(*self._gen_empty_structure([("Fe", 2, 1)]))
s2 = SmactStructure(*self._gen_empty_structure([("Fe", 2, 2)]))
with self.assertRaises(AssertionError):
self.assertStructAlmostEqual(s1, s2)
✅ Verification successful
The addition of strict=False
is appropriate and safe
The assertStructAlmostEqual
method primarily compares the species
and lattice_param
attributes for exact equality, whilst the zip
operations on sites are used to compare coordinates. The SmactStructure
class maintains sites as a dictionary of species and their coordinates, ensuring that mismatched sites would already fail the species
equality check before reaching the coordinate comparison. Therefore, the strict=False
parameter is a safe addition that prevents potential ValueError
exceptions without masking structural differences.
- The species comparison in
assertStructAlmostEqual
ensures structural integrity:self.assertEqual(getattr(s1, cond), getattr(s2, cond))
- The
SmactStructure
class maintains a well-defined structure through its dictionary-based site representation - Test cases using
_gen_empty_structure
demonstrate consistent species handling
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if there are any test cases that exercise unequal length scenarios
rg -A 5 "assertStructAlmostEqual.*\(" smact/tests/test_structure.py
Length of output: 900
Script:
#!/bin/bash
# Let's check the implementation of assertStructAlmostEqual to understand how it handles site comparisons
ast-grep --pattern 'def assertStructAlmostEqual($$$)'
Length of output: 55
Script:
#!/bin/bash
# Let's also check if there are any test cases that explicitly test structures with different site counts
rg -B 2 -A 5 "test.*different.*sites|test.*unequal.*sites" smact/tests/test_structure.py
Length of output: 90
Script:
#!/bin/bash
# Let's examine the test data setup to understand the structure generation
rg "_gen_empty_structure" smact/tests/test_structure.py
Length of output: 770
Script:
#!/bin/bash
# Let's check the implementation of assertStructAlmostEqual with a different pattern
rg -A 10 "def assertStructAlmostEqual" smact/tests/test_structure.py
# Also check how SmactStructure handles sites
rg -A 5 "class SmactStructure" smact/
Length of output: 1043
Script:
#!/bin/bash
# Let's check the full implementation of SmactStructure class to understand site handling
rg -A 20 "class SmactStructure:" smact/structure_prediction/structure.py
# Also check if there are any existing validation for sites in the class
rg "def.*sites" smact/structure_prediction/structure.py
Length of output: 1237
smact/structure_prediction/structure.py (1)
218-218
: Verify handling of mismatched species and stoichiometry counts
Adding strict=False
to the zip operation could mask data inconsistencies if the number of species doesn't match their stoichiometry counts. This might lead to incomplete or incorrect structure representations.
Let's verify the potential impact:
✅ Verification successful
Let me gather more context about the implementation to understand how the species and stoichiometry data is handled.
Let me try one more approach to understand the implementation and usage of this method.
Let me try one final approach to understand the full context of the zip operation.
Adding strict=False
to zip operation is safe in this context
The implementation shows that total_specs
is directly derived from sites.values()
using list comprehension, ensuring both iterables in the zip operation will always have matching lengths:
total_specs = [len(val) for val in sites.values()]
creates a list with the same length assites.keys()
- The subsequent operations (GCD calculation and division) maintain this length
- The
strict=False
parameter is defensive programming that doesn't impact correctness
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for potential mismatches between species and stoichiometry counts
# in existing structure files and test files.
# Search for test files containing structure data
echo "Searching for test files with structure data..."
fd -e txt -e cif . tests/ -x cat {} | rg -A 5 "species.*stoichiometry"
# Search for potential structure parsing calls
echo "Searching for structure parsing calls..."
rg -A 3 "__parse_py_sites"
Length of output: 1273
Script:
#!/bin/bash
# Check the implementation of __parse_py_sites method
ast-grep --pattern 'def __parse_py_sites($$$)'
# Look for test cases involving this method
rg -B 2 -A 10 "test.*parse.*sites" tests/
Length of output: 139
Script:
#!/bin/bash
# Get the full implementation of __parse_py_sites method
rg -A 30 "def __parse_py_sites" smact/structure_prediction/structure.py
# Look for any usage patterns of zip with sites and species
rg -B 2 -A 2 "zip\(.*sites.*species" smact/structure_prediction/structure.py
Length of output: 1423
Script:
#!/bin/bash
# Get more context around the specific line with the zip operation
rg -B 10 -A 10 "zip\(sites\.keys\(\), total_specs" smact/structure_prediction/structure.py
# Look for the total_specs variable definition
rg -B 5 "total_specs.*=" smact/structure_prediction/structure.py
Length of output: 1588
smact/data_loader.py (1)
444-444
: Consider adding logging for data mismatches.
While adding strict=False
makes the code more robust, silently ignoring mismatches between keys and values could hide data quality issues. Consider adding debug-level logging when such mismatches occur.
- _element_data.update({items[0]: dict(list(zip(keys, clean_items, strict=False)))})
+ if len(keys) != len(clean_items):
+ logging.debug(f"Mismatch in data columns for element {items[0]}: expected {len(keys)} columns, got {len(clean_items)}")
+ _element_data.update({items[0]: dict(list(zip(keys, clean_items, strict=False)))})
Let's verify the potential for data mismatches:
@@ -80,6 +80,6 @@ def wurtzite(species, cell_par=None, repetitions=None): | |||
sites_list = [] | |||
oxidation_states = [[1], [2], [3], [4], [-1], [-2], [-3], [-4]] | |||
|
|||
for site in zip(system.get_scaled_positions(), oxidation_states): | |||
for site in zip(system.get_scaled_positions(), oxidation_states, strict=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Refactor oxidation state handling for wurtzite structures
The current implementation allows mismatched crystal sites and oxidation states through strict=False
, whilst also providing more oxidation states than typically needed for wurtzite structures. This could lead to chemically invalid models.
Consider refactoring to:
- Accept specific oxidation states as parameters
- Validate the number of sites matches the structure
-def wurtzite(species, cell_par=None, repetitions=None):
+def wurtzite(species, oxidation_states=None, cell_par=None, repetitions=None):
"""
Build a wurzite cell using the crystal function in ASE.
Args:
----
species (str): Element symbols
+ oxidation_states (list): List of oxidation states for the structure
cell_par (list): Six floats/ints specifying 3 unit cell lengths and 3 unit cell angles.
repetitions (list): Three floats specifying the expansion of the cell in x,y,z directions.
"""
# ... existing crystal setup code ...
sites_list = []
- oxidation_states = [[1], [2], [3], [4], [-1], [-2], [-3], [-4]]
+ if oxidation_states is None:
+ # Default to common wurtzite oxidation states (e.g., for ZnO)
+ oxidation_states = [[2], [-2]]
+
+ scaled_positions = system.get_scaled_positions()
+ if len(scaled_positions) != len(oxidation_states):
+ raise ValueError(
+ f"Number of crystal sites ({len(scaled_positions)}) "
+ f"doesn't match oxidation states ({len(oxidation_states)})"
+ )
- for site in zip(system.get_scaled_positions(), oxidation_states, strict=False):
+ for site in zip(scaled_positions, oxidation_states):
sites_list.append(Site(site[0], site[1]))
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
for site in zip(system.get_scaled_positions(), oxidation_states, strict=False): | |
def wurtzite(species, oxidation_states=None, cell_par=None, repetitions=None): | |
""" | |
Build a wurzite cell using the crystal function in ASE. | |
Args: | |
---- | |
species (str): Element symbols | |
oxidation_states (list): List of oxidation states for the structure | |
cell_par (list): Six floats/ints specifying 3 unit cell lengths and 3 unit cell angles. | |
repetitions (list): Three floats specifying the expansion of the cell in x,y,z directions. | |
""" | |
# ... existing crystal setup code ... | |
sites_list = [] | |
if oxidation_states is None: | |
# Default to common wurtzite oxidation states (e.g., for ZnO) | |
oxidation_states = [[2], [-2]] | |
scaled_positions = system.get_scaled_positions() | |
if len(scaled_positions) != len(oxidation_states): | |
raise ValueError( | |
f"Number of crystal sites ({len(scaled_positions)}) " | |
f"doesn't match oxidation states ({len(oxidation_states)})" | |
) | |
for site in zip(scaled_positions, oxidation_states): | |
sites_list.append(Site(site[0], site[1])) |
@@ -321,10 +321,10 @@ def ml_rep_generator( | |||
|
|||
ML_rep = [0 for i in range(1, 103)] | |||
if isinstance(composition[0], Element): | |||
for element, stoich in zip(composition, stoichs): | |||
for element, stoich in zip(composition, stoichs, strict=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Remove unnecessary strict=False
parameter
The strict=False
parameter is unnecessary here because:
- When
stoichs
is None, it's initialised with matching length tocomposition
- When
stoichs
is provided, it should match the length ofcomposition
for valid chemical representation
- for element, stoich in zip(composition, stoichs, strict=False):
+ for element, stoich in zip(composition, stoichs):
- for element, stoich in zip(composition, stoichs, strict=False):
+ for element, stoich in zip(composition, stoichs):
Also applies to: 327-327
@@ -281,7 +281,7 @@ | |||
|
|||
""" | |||
min_cation_eneg, max_anion_eneg = 10, 0 | |||
for ox_state, eneg in zip(ox_states, enegs): | |||
for ox_state, eneg in zip(ox_states, enegs, strict=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maintain input validation in alternate implementation
Similar to the other electronegativity test functions, the oxidation states and electronegativities must have matching lengths for valid comparison. The strict=False
parameter could mask potential input errors.
- for ox_state, eneg in zip(ox_states, enegs, strict=False):
+ for ox_state, eneg in zip(ox_states, enegs):
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
for ox_state, eneg in zip(ox_states, enegs, strict=False): | |
for ox_state, eneg in zip(ox_states, enegs): |
@@ -205,7 +205,7 @@ | |||
cations, otherwise False | |||
|
|||
""" | |||
for (ox1, eneg1), (ox2, eneg2) in combinations(list(zip(ox_states, enegs)), 2): | |||
for (ox1, eneg1), (ox2, eneg2) in combinations(list(zip(ox_states, enegs, strict=False)), 2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Optimise memory usage and maintain input validation
The current implementation has two potential improvements:
- Remove the unnecessary
list
conversion ascombinations
can work directly with thezip
iterator - Remove
strict=False
as the oxidation states and electronegativities must have matching lengths for valid comparison
- for (ox1, eneg1), (ox2, eneg2) in combinations(list(zip(ox_states, enegs, strict=False)), 2):
+ for (ox1, eneg1), (ox2, eneg2) in combinations(zip(ox_states, enegs), 2):
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
for (ox1, eneg1), (ox2, eneg2) in combinations(list(zip(ox_states, enegs, strict=False)), 2): | |
for (ox1, eneg1), (ox2, eneg2) in combinations(zip(ox_states, enegs), 2): |
@@ -103,7 +103,7 @@ | |||
return len(symbols) == len(set(symbols)) | |||
else: | |||
anions, cations = [], [] | |||
for state, symbol in zip(oxidation_states, symbols): | |||
for state, symbol in zip(oxidation_states, symbols, strict=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Strict zipping should be maintained for input validation
The _no_repeats
function is a critical validation function used within the eneg_states_test_alternate
method to ensure there are no unwanted repetitions of ions. Since:
- There are no test cases that explicitly rely on mismatched lengths
- The function is an internal validator that checks for ion repetitions
- The input lists (
oxidation_states
andsymbols
) should always have matching lengths by design
Using strict=False
could mask potential bugs where the input lists have mismatched lengths, which would indicate a problem in the calling code. It's safer to maintain strict validation to catch such issues early.
🔗 Analysis chain
Consider maintaining strict zipping for input validation
Adding strict=False
to the zip
function could mask potential issues where oxidation_states
and symbols
lists should have matching lengths. Since this is an internal validation function, it would be safer to maintain strict checking to catch any mismatched inputs early.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if there are any test cases that rely on mismatched lengths
rg -A 5 "_no_repeats" "tests/"
Length of output: 75
Script:
#!/bin/bash
# Check for test files in the repository
fd "test.*\.py$"
# Check the implementation of _no_repeats function
ast-grep --pattern $'def _no_repeats($$$) {
$$$
}'
Length of output: 179
Script:
#!/bin/bash
# Search for _no_repeats usage in test files
rg -A 5 "_no_repeats" "smact/tests/test_core.py" "smact/tests/test_doper.py" "smact/tests/test_structure.py" "smact/tests/test_utils.py"
# Check the implementation context of _no_repeats
rg -B 5 -A 10 "_no_repeats" "smact/screening.py"
Length of output: 1047
@@ -418,7 +418,7 @@ def from_poscar(poscar: str): | |||
total_specs = [int(x / hcf) for x in total_specs] | |||
|
|||
species = [] | |||
for spec_str, stoic in zip(lines[0].split(" "), total_specs): | |||
for spec_str, stoic in zip(lines[0].split(" "), total_specs, strict=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add input validation for POSCAR format
The addition of strict=False
to the zip operation could allow processing of malformed POSCAR files. Consider adding explicit validation of the input format to ensure data consistency.
Apply this diff to add validation:
@@ -421,6 +421,10 @@
+ # Validate POSCAR format
+ if len(lines[0].split(" ")) != len(total_specs):
+ raise ValueError(
+ f"Mismatch between species count ({len(lines[0].split(' '))}) and stoichiometry count ({len(total_specs)})"
+ )
for spec_str, stoic in zip(lines[0].split(" "), total_specs, strict=False):
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
for spec_str, stoic in zip(lines[0].split(" "), total_specs, strict=False): | |
# Validate POSCAR format | |
if len(lines[0].split(" ")) != len(total_specs): | |
raise ValueError( | |
f"Mismatch between species count ({len(lines[0].split(' '))}) and stoichiometry count ({len(total_specs)})" | |
) | |
for spec_str, stoic in zip(lines[0].split(" "), total_specs, strict=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
smact/tests/test_utils.py (1)
Line range hint
115-132
: Enhance test robustness and maintainabilityWhilst the test implementation follows good practices with cleanup, there are several improvements that could make it more robust:
- Use setUp/tearDown to ensure cleanup even if the test fails
- Add error handling for API failures
- Use class constants for paths and configuration
- Add more specific assertions about the downloaded data structure
Consider this improved implementation:
+ def setUp(self): + self.test_data_dir = "data/binary/mp_data" + self.cleanup_dirs = [] + def tearDown(self): + for directory in self.cleanup_dirs: + if os.path.exists(directory): + shutil.rmtree(directory) @unittest.skipUnless( (os.environ.get("MP_API_KEY") or SETTINGS.get("PMG_MAPI_KEY")), "requires MP_API key to be set." ) def test_download_compounds_with_mp_api(self): - save_mp_dir = "data/binary/mp_data" + self.cleanup_dirs.append(self.test_data_dir) try: download_compounds_with_mp_api.download_mp_data( mp_api_key=os.environ.get("MP_API_KEY"), num_elements=2, max_stoich=1, - save_dir=save_mp_dir, + save_dir=self.test_data_dir, ) except Exception as e: self.fail(f"API download failed: {str(e)}") # Check if the data was downloaded - self.assertTrue(os.path.exists(save_mp_dir)) - self.assertTrue(len(os.listdir(save_mp_dir)) > 0) + self.assertTrue(os.path.exists(self.test_data_dir)) + downloaded_files = os.listdir(self.test_data_dir) + self.assertTrue(len(downloaded_files) > 0) + + # Verify file structure and content + for filename in downloaded_files: + filepath = os.path.join(self.test_data_dir, filename) + self.assertTrue(os.path.getsize(filepath) > 0) - - # Clean up - shutil.rmtree(save_mp_dir)
@unittest.skipUnless( | ||
(os.environ.get("MP_API_KEY") or SETTINGS.get("PMG_MAPI_KEY")), "requires MP_API key to be set." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
CI configuration needs attention for Materials Project API tests
The CI workflow is missing the required MP_API_KEY secret configuration, though the crystal_space dependencies are properly installed via pip install -e ".[crystal_space,dev]"
. To ensure proper test execution:
- Add the MP_API_KEY secret to the repository's GitHub Actions secrets
- Update the CI workflow to include the MP_API_KEY environment variable in the test step:
- name: Run tests and collect coverage
env:
MP_API_KEY: ${{ secrets.MP_API_KEY }}
run: python -m pytest --cov=smact --cov-report=xml -v
🔗 Analysis chain
Verify CI environment configuration
The test requires the Materials Project API key. Ensure that:
- The CI workflow has the MP_API_KEY secret configured
- The crystal_space optional dependencies are properly installed in CI
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if MP_API_KEY is configured in GitHub secrets
gh api repos/:owner/:repo/actions/secrets | jq '.secrets[] | select(.name=="MP_API_KEY")'
# Verify crystal_space dependencies in pyproject.toml
rg -A 5 '\[crystal_space\]' pyproject.toml
Length of output: 248
Script:
#!/bin/bash
# Check CI workflow files for API key configuration
fd -e yml -e yaml . .github/workflows/ --exec cat {}
# Check test dependencies in pyproject.toml
cat pyproject.toml
# Check if there are any environment setup scripts
fd -e sh -e bash . .github/workflows/ --exec cat {}
Length of output: 16636
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #331 +/- ##
==========================================
+ Coverage 73.56% 73.74% +0.17%
==========================================
Files 31 31
Lines 2531 2544 +13
==========================================
+ Hits 1862 1876 +14
+ Misses 669 668 -1 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
Download optional dependencies
Description
Optional dependencies are required to test some of the functions in the package. This PR ensures that these dependencies are included in the GitHub Actions workflow
Type of change
How Has This Been Tested?
Test Configuration:
Reviewers
N/A
Checklist
Summary by CodeRabbit