Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download crystal space optional dependencies in ci #331

Merged
merged 4 commits into from
Nov 21, 2024
Merged

Conversation

AntObi
Copy link
Collaborator

@AntObi AntObi commented Nov 21, 2024

Download optional dependencies

Description

Optional dependencies are required to test some of the functions in the package. This PR ensures that these dependencies are included in the GitHub Actions workflow

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Test Configuration:

  • Python version: 3.10
  • Operating System: macOS

Reviewers

N/A

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Summary by CodeRabbit

  • Chores
    • Updated the CI workflow to include additional optional dependencies during the installation process.
  • New Features
    • Project version updated from 2.6 to 2.8, enhancing overall project capabilities.
    • Minimum Python version requirement increased to 3.10, ensuring compatibility with newer features.
    • Dependency specifications refined for improved performance and flexibility.
    • Added a new test method for verifying functionality with the Materials Project API.
  • Bug Fixes
    • Enhanced error handling in various functions to accommodate lists of differing lengths, improving robustness.
    • Adjusted handling of input in multiple methods to prevent errors related to mismatched lengths.

Copy link
Contributor

coderabbitai bot commented Nov 21, 2024

Walkthrough

The pull request modifies the CI workflow configuration in the .github/workflows/ci.yml file by updating the dependency installation command within the test job. The command has been changed from pip install -e . to pip install -e ".[crystal_space,dev]", which now includes additional optional dependencies. Additionally, the pyproject.toml file has been updated to reflect changes in project versioning, Python version requirements, and dependency specifications. Other files have been modified to enhance error handling and control flow by adding the strict=False argument to zip function calls across various functions.

Changes

File Change Summary
.github/workflows/ci.yml Updated dependency installation command to include optional extras crystal_space and dev.
pyproject.toml Updated project version to 2.8, minimum Python version to >=3.10, removed upper limit for pymatgen, and updated target version for ruff to py310.
smact/builder.py Added strict=False to zip calls in cubic_perovskite and wurtzite functions.
smact/data_loader.py Updated lookup_element_data function to use strict=False in zip for dictionary creation.
smact/dopant_prediction/doper.py Modified _merge_dicts method to include strict=False in zip function.
smact/screening.py Added strict=False to zip calls in multiple functions for enhanced error handling.
smact/structure_prediction/mutation.py Updated type annotations and added strict=False to zip calls in CationMutator methods.
smact/structure_prediction/structure.py Modified __parse_py_sites and from_poscar methods to use strict=False in zip.
smact/tests/test_doper.py Enhanced test_dopant_prediction method with strict=False in zip calls.
smact/tests/test_structure.py Updated multiple test methods to include strict=False in zip calls for better error handling.
smact/utils/composition.py Added strict=False to zip calls in comp_maker function for handling mismatched lengths.
smact/utils/crystal_space/generate_composition_with_smact.py Enhanced convert_formula and generate_composition_with_smact functions with strict=False in zip calls.

Possibly related PRs

  • Oxidation States 24 #329: The changes in this PR introduce new oxidation states sourced from the ICSD, which may relate to the modifications in the CI workflow and dependency management in the main PR, particularly if the new oxidation states require specific dependencies or configurations for testing and validation.

Suggested labels

enhancement, bug, dependencies, python, feature

Poem

In the garden where we play,
Dependencies bloom bright today.
With crystal space and dev in tow,
Our project’s ready, watch it grow!
Hops and skips, we cheer with glee,
For every change brings joy, you see! 🐇✨

Warning

Rate limit exceeded

@AntObi has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 16 minutes and 53 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 77338da and edc6a50.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
.github/workflows/ci.yml (1)

31-35: Consider adding pip caching to speed up CI.

To improve CI performance, consider adding pip caching. Also, ensure your package's Python version constraints match the CI matrix (3.10-3.12).

Add this before the pip install steps:

      - name: Install dependencies
        run: |
+         python -m pip cache dir
+     - uses: actions/cache@v3
+       with:
+         path: ~/.cache/pip
+         key: ${{ runner.os }}-pip-${{ hashFiles('**/setup.py') }}
+         restore-keys: |
+           ${{ runner.os }}-pip-
          python -m pip install --upgrade pip wheel setuptools
          pip install -e ".[crystal_space,dev]"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between ace22a8 and ca934ae.

📒 Files selected for processing (1)
  • .github/workflows/ci.yml (1 hunks)
🔇 Additional comments (1)
.github/workflows/ci.yml (1)

34-34: LGTM! The dependency installation looks correct.

The syntax for installing optional dependencies is accurate. Including both crystal_space and dev extras ensures that all required dependencies for testing are available in the CI environment.

Let's verify that these extras are properly defined in the package setup:

✅ Verification successful

The crystal_space dependencies are properly defined and used in tests

The verification confirms that:

  • The crystal_space optional dependencies are correctly defined in pyproject.toml
  • The dependencies are actively used in smact/tests/test_utils.py
  • Both crystal_space and its dependency mp-api are referenced in the test suite
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the crystal_space and dev extras are defined in setup.py or pyproject.toml

# Check for extras in setup.py
echo "Checking setup.py for extras..."
rg -A 5 'extras_require.*=.*{' setup.py

# Check for extras in pyproject.toml
echo "Checking pyproject.toml for extras..."
rg -A 5 '\[project.optional-dependencies\]' pyproject.toml

# Check if crystal_space is imported in tests
echo "Verifying crystal_space usage in tests..."
rg -l "import.*crystal.*space" "tests/"

Length of output: 642


Script:

#!/bin/bash
# Let's fix the regex and search for extras in setup.py
echo "Checking setup.py for extras..."
rg "extras_require\s*=" setup.py -A 10

# Check for crystal_space dependencies
echo "Checking crystal_space dependencies..."
rg "crystal.?space" pyproject.toml

# Check test directory location
echo "Looking for test directory..."
fd -t d "test|tests"

Length of output: 369


Script:

#!/bin/bash
# Check crystal_space usage in the correct test directory
echo "Checking crystal_space usage in tests..."
rg -l "crystal.?space" "smact/tests/"

# Get more context around crystal_space dependencies
echo "Getting full crystal_space dependencies..."
rg -A 5 "crystal_space.*=" pyproject.toml

# Check if mp-api is used in tests
echo "Checking mp-api usage in tests..."
rg -l "mp.?api" "smact/tests/"

Length of output: 532

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between ca934ae and 07c41a0.

📒 Files selected for processing (1)
  • pyproject.toml (4 hunks)
🔇 Additional comments (3)
pyproject.toml (3)

89-89: LGTM: Ruff target version update

The Ruff target version update to py310 correctly aligns with the new minimum Python requirement.


8-8: Verify if version bump follows semantic versioning

The version increase from 2.6 to 2.8 skips 2.7. Please ensure this aligns with semantic versioning principles and that the changelog reflects the significant changes warranting two minor version increments.


46-46: Consider maintaining an upper bound for pymatgen dependency

Removing the upper bound constraint (<2024.8.8) for pymatgen could lead to compatibility issues if breaking changes are introduced in future versions. Consider maintaining an upper bound to ensure stability.

@@ -5,7 +5,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "SMACT"
version = "2.6"
version = "2.8"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove Python 3.9 classifier as it conflicts with requires-python

The requires-python = ">=3.10" specification conflicts with the Python 3.9 classifier. This could mislead users about version compatibility.

- "Programming Language :: Python :: 3.9",

Also applies to: 30-30, 24-24

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (8)
smact/builder.py (1)

Line range hint 46-83: Consider a more robust approach to structure validation

The addition of strict=False across multiple files appears to be addressing symptoms rather than the root cause. Given this PR's focus on optional dependencies for testing, consider:

  1. Implementing a validation layer for crystal structures
  2. Adding proper test fixtures with optional dependencies
  3. Creating structure-specific oxidation state validators

This would provide better error handling and maintain chemical validity whilst supporting the testing improvements mentioned in the PR objectives.

smact/utils/composition.py (1)

Line range hint 66-77: Consider adding validation and logging for mismatched sequences

While making the zip operations non-strict provides more flexibility, it might mask data quality issues. Consider:

  1. Adding input validation to log warnings when sequences are mismatched
  2. Implementing a validation step to ensure generated chemical formulas are valid
  3. Adding documentation to explain why strict=False is necessary and its implications
smact/tests/test_doper.py (1)

39-40: Consider adding a comment explaining the unequal lengths scenario

It would be helpful to add a brief comment explaining why strict=False is necessary here. This would help future maintainers understand that substitution lists can have different lengths and that this is expected behaviour.

+# Substitution lists may have different lengths depending on available dopants
 for n_atom, p_atom in zip(n_sub_list_cat, p_sub_list_cat, strict=False):

Also applies to: 43-44

smact/utils/crystal_space/generate_composition_with_smact.py (1)

35-37: Consider adding debug logging for length mismatches

Whilst adding strict=False makes the code more resilient, silently handling mismatched lengths between symbols and counts could mask potential issues. Consider adding debug logging when lengths differ to help identify unexpected scenarios.

+        if len(symbols) != len(counts):
+            logging.debug(f"Length mismatch in convert_formula: symbols={len(symbols)}, counts={len(counts)}")
         formula_dict = dict(zip(symbols, counts, strict=False))
smact/structure_prediction/mutation.py (1)

Line range hint 37-38: Type hints for alpha parameter look good

The addition of Callable[[str, str], float] | None type hint improves type safety and development-time error catching. Consider enhancing the docstring to include the type information:

-            alpha: A function to call to fill in missing lambda values.
+            alpha: Optional callable (str, str) -> float to fill in missing lambda values.
                 The function must take the two species' strings as
                 arguments, and return a floating point lambda
                 value.

Also applies to: 81-82

smact/dopant_prediction/doper.py (1)

Line range hint 94-101: Add input validation to ensure data integrity

Consider adding input validation to ensure all input lists have the expected length of 4 items before performing the merge operation. This would help catch potential issues early and make the code more robust.

Here's a suggested implementation:

     def _merge_dicts(self, keys, dopants_list, groupby_list):
+        expected_length = 4  # Number of dopant substitution types
+        if not all(len(lst) == expected_length for lst in [keys, dopants_list, groupby_list]):
+            raise ValueError(
+                f"All input lists must have length {expected_length}. "
+                f"Got lengths: keys={len(keys)}, dopants={len(dopants_list)}, "
+                f"groupby={len(groupby_list)}"
+            )
         merged_dict = dict()
-        for k, dopants, group in zip(keys, dopants_list, groupby_list, strict=False):
+        for k, dopants, group in zip(keys, dopants_list, groupby_list):
             merged_values = dict()
             merged_values["sorted"] = dopants
smact/tests/test_structure.py (1)

561-561: Consider adding tolerance parameter for probability comparisons

The addition of strict=False to the zip call in probability comparisons is appropriate. However, the test might be brittle due to direct floating-point comparisons.

Consider adding a tolerance parameter for probability comparisons:

def assertProbabilityAlmostEqual(self, p1, p2, places=7):
    """Assert that two probabilities are almost equal."""
    self.assertAlmostEqual(p1, p2, places=places)
smact/structure_prediction/structure.py (1)

Line range hint 218-421: Consider implementing a validation layer

The introduction of strict=False in multiple locations suggests a need for a more robust validation layer. Consider implementing a dedicated validation module to ensure data consistency across the codebase.

Recommendations:

  1. Create a dedicated validation module for structure data.
  2. Implement comprehensive validation checks for both pymatgen and POSCAR inputs.
  3. Add logging for cases where data mismatches occur.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 07c41a0 and 77338da.

📒 Files selected for processing (10)
  • smact/builder.py (2 hunks)
  • smact/data_loader.py (1 hunks)
  • smact/dopant_prediction/doper.py (1 hunks)
  • smact/screening.py (5 hunks)
  • smact/structure_prediction/mutation.py (2 hunks)
  • smact/structure_prediction/structure.py (2 hunks)
  • smact/tests/test_doper.py (1 hunks)
  • smact/tests/test_structure.py (5 hunks)
  • smact/utils/composition.py (2 hunks)
  • smact/utils/crystal_space/generate_composition_with_smact.py (2 hunks)
🔇 Additional comments (14)
smact/builder.py (1)

46-46: Verify crystal site and oxidation state mapping

Whilst adding strict=False prevents crashes when handling mismatched sequences, it could potentially mask structural issues if the number of crystal sites doesn't match the expected oxidation states for a cubic perovskite (which should have exactly 5 sites).

Let's verify the crystal site generation:

✅ Verification successful

Using strict=False is appropriate for this implementation

The implementation is correct and safe to use strict=False here because:

  1. The crystal structure is explicitly defined with a fixed basis of 3 positions using space group 221 (Pm-3m, cubic perovskite)
  2. The oxidation states list is hardcoded as [[2]] + [[4]] + [[-2]] * 3, which matches the expected ABO₃ perovskite structure
  3. The ASE crystal generator ensures the correct number of sites based on the specified space group and basis positions
  4. The strict=False parameter provides flexibility for handling supercell repetitions without breaking the oxidation state mapping
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for tests or documentation that verify the number of sites in cubic perovskites
rg -g '!*.pyc' -g '!*{.json,.txt}' "cubic_perovskite.*test|test.*cubic_perovskite" -A 5

Length of output: 573


Script:

#!/bin/bash
# Let's examine the implementation of cubic_perovskite to understand site generation
ast-grep --pattern 'def cubic_perovskite($$$)'

# Also check for any test files
fd "test.*\.py" --exec rg -l "cubic_perovskite"

Length of output: 123


Script:

#!/bin/bash
# Let's examine the test file and the builder implementation
cat smact/tests/test_core.py | rg -A 10 "cubic_perovskite"

# Also check the actual implementation
cat smact/builder.py | rg -A 20 "def cubic_perovskite"

Length of output: 1378


Script:

#!/bin/bash
# Let's see the complete implementation of cubic_perovskite to understand site generation
cat smact/builder.py | rg -B 5 -A 30 "def cubic_perovskite"

# Also check if there are any other tests that might verify the structure
rg -g "*.py" "crystal\(" -A 5

Length of output: 1789

smact/utils/composition.py (2)

66-69: Verify handling of mismatched element-amount pairs

The addition of strict=False allows the zip operation to continue even when the number of elements doesn't match the number of amounts. This could lead to truncated chemical formulas. Please verify this is the intended behaviour.

✅ Verification successful

Let me gather more context about the test cases and mock data to understand the expected behavior.


Based on the test cases and implementation, I can now provide a final response:

The strict=False parameter is correctly implemented for handling valid chemical formulas

The test cases demonstrate that the comp_maker function correctly handles both 2-tuple and 3-tuple inputs, producing valid chemical formulas like "FeO" and "Fe3O4". The non-strict zipping is intentional and necessary because:

  • For 2-tuple inputs (elements and amounts), the test case mock_filter_output[1] confirms correct handling of ("Fe", "O"), (1, 1)
  • The function successfully creates Pymatgen Composition objects that match expected chemical formulas
  • All test assertions pass, validating both simple (FeO) and complex (Li10Ge(PS6)2) formulas
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for test cases that verify the handling of mismatched sequences
rg -l "test.*comp_maker" | xargs rg "assert.*comp_maker.*" -A 5

# Look for actual usage patterns to understand typical input shapes
ast-grep --pattern 'comp_maker($$$)'

Length of output: 530


Script:

#!/bin/bash
# Check the test setup and mock data
rg -A 10 "mock_filter_output" smact/tests/test_utils.py

# Look at the actual implementation of comp_maker
rg -A 20 "def comp_maker" smact/utils/composition.py

# Check if there are any related issues or PRs that discuss this change
rg -i "strict.*zip" smact/

Length of output: 2638


77-77: Validate handling of mismatched element-oxidation-amount triplets

The strict=False parameter allows zipping to continue with partial data when element, oxidation state, and amount sequences have different lengths. This could potentially create invalid chemical species.

✅ Verification successful

Let me gather more information about the usage and validation of these triplets.


Let me check the test setup and mock data to understand how mismatched triplets are handled in tests.


Using strict=False for zip() is intentional and safe in this context

The use of strict=False in the comp_maker function is actually appropriate and well-tested. The test cases show that the function correctly handles both two-tuple and three-tuple inputs from smact_filter, where the tuples are guaranteed to have matching lengths by the smact_filter implementation. The strict=False parameter is used defensively to maintain compatibility with different output formats, as evidenced by the test cases:

  • Two-tuple format: (("Fe", "O"), (1, 1))
  • Three-tuple format: (("Fe", "Fe", "O"), (2, 3, -2), (1, 2, 4))

The test suite validates that both formats are handled correctly and produce valid chemical compositions.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for examples where these triplets are constructed
rg "smact_filter" -A 5 | rg -v "test"

# Look for any existing error handling around these operations
ast-grep --pattern 'try {
  $$$
  zip($$$)
  $$$
} catch ($$$) {
  $$$
}'

Length of output: 13166


Script:

#!/bin/bash
# Look for test cases that verify handling of mismatched triplets
rg "test.*comp_maker|test.*formula_maker" -A 10

# Check for any validation of input lengths before zipping
ast-grep --pattern 'if len($x) != len($y)'

# Look for any error handling around composition creation
rg "raise.*Error.*comp" -A 2

Length of output: 1816


Script:

#!/bin/bash
# Look at the test setup and mock data
rg "mock_filter_output.*=" -B 5 -A 5

# Check the actual implementation of smact_filter to see how it constructs these triplets
ast-grep --pattern 'def smact_filter($$$) {
  $$$
}'

# Look for any validation in the composition creation
rg "def comp_maker" -A 20

Length of output: 7483

smact/tests/test_doper.py (1)

39-40: LGTM! Good use of strict=False parameter

The addition of strict=False to the zip operations makes the test more resilient by explicitly handling cases where the substitution lists have different lengths. This is a sensible approach since the test only needs to verify charge relationships for available pairs.

Also applies to: 43-44

smact/utils/crystal_space/generate_composition_with_smact.py (1)

123-126: Verify test coverage for mismatched lengths

The addition of strict=False suggests that mismatched lengths between res[0] and res[2] are possible. Please ensure there are test cases covering these scenarios.

smact/structure_prediction/mutation.py (1)

10-10: LGTM! Import changes look good

The addition of Callable type and the empty line after imports improve code organisation and type safety whilst following PEP 8 guidelines.

Also applies to: 19-19

smact/dopant_prediction/doper.py (1)

95-95: Consider removing strict=False from zip operation

The strict=False parameter allows the zip operation to silently truncate when input lists have different lengths. However, in this context, all input lists (keys, dopants_list, groupby_list) should always contain exactly 4 items as they represent different types of dopant substitutions. Silently truncating could mask potential bugs.

Let's verify the list lengths at the call site:

smact/screening.py (1)

250-250: Same optimisation needed as in eneg_states_test

This line has the same inefficiencies as discussed in the eneg_states_test function.

smact/tests/test_structure.py (4)

370-370: LGTM: Lambda interface test cases are well-structured

The addition of strict=False to the zip call in test_lambda_interface is appropriate as it allows for flexible testing of different permutations.


303-303: Verify test coverage for edge cases

The addition of strict=False to the zip call in test_db_interface suggests that test cases and expected results might have different lengths. This could indicate missing test coverage for certain scenarios.

✅ Verification successful

No issues with test coverage - strict=False is used correctly

The strict=False parameter in the zip call is used appropriately here. Looking at the test file, we can see that test_with_species_args and test_with_species_exp are explicitly defined lists with matching lengths, containing 8 test cases each. The test cases comprehensively cover various scenarios including:

  • Single species queries
  • Multiple species queries
  • Different ordering of species
  • Invalid species combinations
  • Complete composition matches

The strict=False parameter is used defensively as a good programming practice, rather than indicating any mismatch in test coverage.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check the test coverage for get_with_species method
ast-grep --pattern 'def get_with_species($_, $_) {
  $$$
}'

Length of output: 61


Script:

#!/bin/bash
# Let's first check the test file content to understand the test cases
cat smact/tests/test_structure.py

Length of output: 21393


312-313: Consider validating MP data integrity

The addition of strict=False to the zip call handling Materials Project (MP) data suggests potential mismatches between MP IDs and structures. This could indicate data integrity issues.

Consider adding validation to ensure MP data integrity:

def test_mp_data_integrity(self):
    """Test MP data integrity."""
    for data in mp_data:
        self.assertIn("material_id", data)
        self.assertIn("structure", data)

88-89: Consider adding test cases for unequal length scenarios

The addition of strict=False to the zip calls in assertStructAlmostEqual makes the comparison more lenient. Whilst this change prevents potential ValueError exceptions, it might mask issues where structures have different numbers of sites.

Consider adding explicit test cases that verify the behaviour when structures have different numbers of sites:

def test_assertStructAlmostEqual_unequal_sites(self):
    """Test comparison of structures with different numbers of sites."""
    s1 = SmactStructure(*self._gen_empty_structure([("Fe", 2, 1)]))
    s2 = SmactStructure(*self._gen_empty_structure([("Fe", 2, 2)]))
    with self.assertRaises(AssertionError):
        self.assertStructAlmostEqual(s1, s2)
✅ Verification successful

The addition of strict=False is appropriate and safe

The assertStructAlmostEqual method primarily compares the species and lattice_param attributes for exact equality, whilst the zip operations on sites are used to compare coordinates. The SmactStructure class maintains sites as a dictionary of species and their coordinates, ensuring that mismatched sites would already fail the species equality check before reaching the coordinate comparison. Therefore, the strict=False parameter is a safe addition that prevents potential ValueError exceptions without masking structural differences.

  • The species comparison in assertStructAlmostEqual ensures structural integrity: self.assertEqual(getattr(s1, cond), getattr(s2, cond))
  • The SmactStructure class maintains a well-defined structure through its dictionary-based site representation
  • Test cases using _gen_empty_structure demonstrate consistent species handling
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if there are any test cases that exercise unequal length scenarios
rg -A 5 "assertStructAlmostEqual.*\(" smact/tests/test_structure.py

Length of output: 900


Script:

#!/bin/bash
# Let's check the implementation of assertStructAlmostEqual to understand how it handles site comparisons
ast-grep --pattern 'def assertStructAlmostEqual($$$)'

Length of output: 55


Script:

#!/bin/bash
# Let's also check if there are any test cases that explicitly test structures with different site counts
rg -B 2 -A 5 "test.*different.*sites|test.*unequal.*sites" smact/tests/test_structure.py

Length of output: 90


Script:

#!/bin/bash
# Let's examine the test data setup to understand the structure generation
rg "_gen_empty_structure" smact/tests/test_structure.py

Length of output: 770


Script:

#!/bin/bash
# Let's check the implementation of assertStructAlmostEqual with a different pattern
rg -A 10 "def assertStructAlmostEqual" smact/tests/test_structure.py

# Also check how SmactStructure handles sites
rg -A 5 "class SmactStructure" smact/

Length of output: 1043


Script:

#!/bin/bash
# Let's check the full implementation of SmactStructure class to understand site handling
rg -A 20 "class SmactStructure:" smact/structure_prediction/structure.py

# Also check if there are any existing validation for sites in the class
rg "def.*sites" smact/structure_prediction/structure.py

Length of output: 1237

smact/structure_prediction/structure.py (1)

218-218: Verify handling of mismatched species and stoichiometry counts

Adding strict=False to the zip operation could mask data inconsistencies if the number of species doesn't match their stoichiometry counts. This might lead to incomplete or incorrect structure representations.

Let's verify the potential impact:

✅ Verification successful

Let me gather more context about the implementation to understand how the species and stoichiometry data is handled.


Let me try one more approach to understand the implementation and usage of this method.


Let me try one final approach to understand the full context of the zip operation.


Adding strict=False to zip operation is safe in this context

The implementation shows that total_specs is directly derived from sites.values() using list comprehension, ensuring both iterables in the zip operation will always have matching lengths:

  1. total_specs = [len(val) for val in sites.values()] creates a list with the same length as sites.keys()
  2. The subsequent operations (GCD calculation and division) maintain this length
  3. The strict=False parameter is defensive programming that doesn't impact correctness
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential mismatches between species and stoichiometry counts
# in existing structure files and test files.

# Search for test files containing structure data
echo "Searching for test files with structure data..."
fd -e txt -e cif . tests/ -x cat {} | rg -A 5 "species.*stoichiometry"

# Search for potential structure parsing calls
echo "Searching for structure parsing calls..."
rg -A 3 "__parse_py_sites" 

Length of output: 1273


Script:

#!/bin/bash
# Check the implementation of __parse_py_sites method
ast-grep --pattern 'def __parse_py_sites($$$)'

# Look for test cases involving this method
rg -B 2 -A 10 "test.*parse.*sites" tests/

Length of output: 139


Script:

#!/bin/bash
# Get the full implementation of __parse_py_sites method
rg -A 30 "def __parse_py_sites" smact/structure_prediction/structure.py

# Look for any usage patterns of zip with sites and species
rg -B 2 -A 2 "zip\(.*sites.*species" smact/structure_prediction/structure.py

Length of output: 1423


Script:

#!/bin/bash
# Get more context around the specific line with the zip operation
rg -B 10 -A 10 "zip\(sites\.keys\(\), total_specs" smact/structure_prediction/structure.py

# Look for the total_specs variable definition
rg -B 5 "total_specs.*=" smact/structure_prediction/structure.py

Length of output: 1588

smact/data_loader.py (1)

444-444: Consider adding logging for data mismatches.

While adding strict=False makes the code more robust, silently ignoring mismatches between keys and values could hide data quality issues. Consider adding debug-level logging when such mismatches occur.

-            _element_data.update({items[0]: dict(list(zip(keys, clean_items, strict=False)))})
+            if len(keys) != len(clean_items):
+                logging.debug(f"Mismatch in data columns for element {items[0]}: expected {len(keys)} columns, got {len(clean_items)}")
+            _element_data.update({items[0]: dict(list(zip(keys, clean_items, strict=False)))})

Let's verify the potential for data mismatches:

@@ -80,6 +80,6 @@ def wurtzite(species, cell_par=None, repetitions=None):
sites_list = []
oxidation_states = [[1], [2], [3], [4], [-1], [-2], [-3], [-4]]

for site in zip(system.get_scaled_positions(), oxidation_states):
for site in zip(system.get_scaled_positions(), oxidation_states, strict=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Refactor oxidation state handling for wurtzite structures

The current implementation allows mismatched crystal sites and oxidation states through strict=False, whilst also providing more oxidation states than typically needed for wurtzite structures. This could lead to chemically invalid models.

Consider refactoring to:

  1. Accept specific oxidation states as parameters
  2. Validate the number of sites matches the structure
-def wurtzite(species, cell_par=None, repetitions=None):
+def wurtzite(species, oxidation_states=None, cell_par=None, repetitions=None):
     """
     Build a wurzite cell using the crystal function in ASE.
 
     Args:
     ----
         species (str): Element symbols
+        oxidation_states (list): List of oxidation states for the structure
         cell_par (list): Six floats/ints specifying 3 unit cell lengths and 3 unit cell angles.
         repetitions (list): Three floats specifying the expansion of the cell in x,y,z directions.
     """
     # ... existing crystal setup code ...
 
     sites_list = []
-    oxidation_states = [[1], [2], [3], [4], [-1], [-2], [-3], [-4]]
+    if oxidation_states is None:
+        # Default to common wurtzite oxidation states (e.g., for ZnO)
+        oxidation_states = [[2], [-2]]
+
+    scaled_positions = system.get_scaled_positions()
+    if len(scaled_positions) != len(oxidation_states):
+        raise ValueError(
+            f"Number of crystal sites ({len(scaled_positions)}) "
+            f"doesn't match oxidation states ({len(oxidation_states)})"
+        )
 
-    for site in zip(system.get_scaled_positions(), oxidation_states, strict=False):
+    for site in zip(scaled_positions, oxidation_states):
         sites_list.append(Site(site[0], site[1]))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for site in zip(system.get_scaled_positions(), oxidation_states, strict=False):
def wurtzite(species, oxidation_states=None, cell_par=None, repetitions=None):
"""
Build a wurzite cell using the crystal function in ASE.
Args:
----
species (str): Element symbols
oxidation_states (list): List of oxidation states for the structure
cell_par (list): Six floats/ints specifying 3 unit cell lengths and 3 unit cell angles.
repetitions (list): Three floats specifying the expansion of the cell in x,y,z directions.
"""
# ... existing crystal setup code ...
sites_list = []
if oxidation_states is None:
# Default to common wurtzite oxidation states (e.g., for ZnO)
oxidation_states = [[2], [-2]]
scaled_positions = system.get_scaled_positions()
if len(scaled_positions) != len(oxidation_states):
raise ValueError(
f"Number of crystal sites ({len(scaled_positions)}) "
f"doesn't match oxidation states ({len(oxidation_states)})"
)
for site in zip(scaled_positions, oxidation_states):
sites_list.append(Site(site[0], site[1]))

@@ -321,10 +321,10 @@ def ml_rep_generator(

ML_rep = [0 for i in range(1, 103)]
if isinstance(composition[0], Element):
for element, stoich in zip(composition, stoichs):
for element, stoich in zip(composition, stoichs, strict=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove unnecessary strict=False parameter

The strict=False parameter is unnecessary here because:

  1. When stoichs is None, it's initialised with matching length to composition
  2. When stoichs is provided, it should match the length of composition for valid chemical representation
-        for element, stoich in zip(composition, stoichs, strict=False):
+        for element, stoich in zip(composition, stoichs):
-        for element, stoich in zip(composition, stoichs, strict=False):
+        for element, stoich in zip(composition, stoichs):

Also applies to: 327-327

@@ -281,7 +281,7 @@

"""
min_cation_eneg, max_anion_eneg = 10, 0
for ox_state, eneg in zip(ox_states, enegs):
for ox_state, eneg in zip(ox_states, enegs, strict=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Maintain input validation in alternate implementation

Similar to the other electronegativity test functions, the oxidation states and electronegativities must have matching lengths for valid comparison. The strict=False parameter could mask potential input errors.

-    for ox_state, eneg in zip(ox_states, enegs, strict=False):
+    for ox_state, eneg in zip(ox_states, enegs):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for ox_state, eneg in zip(ox_states, enegs, strict=False):
for ox_state, eneg in zip(ox_states, enegs):

@@ -205,7 +205,7 @@
cations, otherwise False

"""
for (ox1, eneg1), (ox2, eneg2) in combinations(list(zip(ox_states, enegs)), 2):
for (ox1, eneg1), (ox2, eneg2) in combinations(list(zip(ox_states, enegs, strict=False)), 2):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Optimise memory usage and maintain input validation

The current implementation has two potential improvements:

  1. Remove the unnecessary list conversion as combinations can work directly with the zip iterator
  2. Remove strict=False as the oxidation states and electronegativities must have matching lengths for valid comparison
-    for (ox1, eneg1), (ox2, eneg2) in combinations(list(zip(ox_states, enegs, strict=False)), 2):
+    for (ox1, eneg1), (ox2, eneg2) in combinations(zip(ox_states, enegs), 2):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for (ox1, eneg1), (ox2, eneg2) in combinations(list(zip(ox_states, enegs, strict=False)), 2):
for (ox1, eneg1), (ox2, eneg2) in combinations(zip(ox_states, enegs), 2):

@@ -103,7 +103,7 @@
return len(symbols) == len(set(symbols))
else:
anions, cations = [], []
for state, symbol in zip(oxidation_states, symbols):
for state, symbol in zip(oxidation_states, symbols, strict=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

⚠️ Potential issue

Strict zipping should be maintained for input validation

The _no_repeats function is a critical validation function used within the eneg_states_test_alternate method to ensure there are no unwanted repetitions of ions. Since:

  • There are no test cases that explicitly rely on mismatched lengths
  • The function is an internal validator that checks for ion repetitions
  • The input lists (oxidation_states and symbols) should always have matching lengths by design

Using strict=False could mask potential bugs where the input lists have mismatched lengths, which would indicate a problem in the calling code. It's safer to maintain strict validation to catch such issues early.

🔗 Analysis chain

Consider maintaining strict zipping for input validation

Adding strict=False to the zip function could mask potential issues where oxidation_states and symbols lists should have matching lengths. Since this is an internal validation function, it would be safer to maintain strict checking to catch any mismatched inputs early.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if there are any test cases that rely on mismatched lengths
rg -A 5 "_no_repeats" "tests/"

Length of output: 75


Script:

#!/bin/bash
# Check for test files in the repository
fd "test.*\.py$"

# Check the implementation of _no_repeats function
ast-grep --pattern $'def _no_repeats($$$) {
  $$$
}'

Length of output: 179


Script:

#!/bin/bash
# Search for _no_repeats usage in test files
rg -A 5 "_no_repeats" "smact/tests/test_core.py" "smact/tests/test_doper.py" "smact/tests/test_structure.py" "smact/tests/test_utils.py"

# Check the implementation context of _no_repeats
rg -B 5 -A 10 "_no_repeats" "smact/screening.py"

Length of output: 1047

@@ -418,7 +418,7 @@ def from_poscar(poscar: str):
total_specs = [int(x / hcf) for x in total_specs]

species = []
for spec_str, stoic in zip(lines[0].split(" "), total_specs):
for spec_str, stoic in zip(lines[0].split(" "), total_specs, strict=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add input validation for POSCAR format

The addition of strict=False to the zip operation could allow processing of malformed POSCAR files. Consider adding explicit validation of the input format to ensure data consistency.

Apply this diff to add validation:

@@ -421,6 +421,10 @@
+        # Validate POSCAR format
+        if len(lines[0].split(" ")) != len(total_specs):
+            raise ValueError(
+                f"Mismatch between species count ({len(lines[0].split(' '))}) and stoichiometry count ({len(total_specs)})"
+            )
         for spec_str, stoic in zip(lines[0].split(" "), total_specs, strict=False):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for spec_str, stoic in zip(lines[0].split(" "), total_specs, strict=False):
# Validate POSCAR format
if len(lines[0].split(" ")) != len(total_specs):
raise ValueError(
f"Mismatch between species count ({len(lines[0].split(' '))}) and stoichiometry count ({len(total_specs)})"
)
for spec_str, stoic in zip(lines[0].split(" "), total_specs, strict=False):

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
smact/tests/test_utils.py (1)

Line range hint 115-132: Enhance test robustness and maintainability

Whilst the test implementation follows good practices with cleanup, there are several improvements that could make it more robust:

  1. Use setUp/tearDown to ensure cleanup even if the test fails
  2. Add error handling for API failures
  3. Use class constants for paths and configuration
  4. Add more specific assertions about the downloaded data structure

Consider this improved implementation:

+ def setUp(self):
+     self.test_data_dir = "data/binary/mp_data"
+     self.cleanup_dirs = []

+ def tearDown(self):
+     for directory in self.cleanup_dirs:
+         if os.path.exists(directory):
+             shutil.rmtree(directory)

  @unittest.skipUnless(
      (os.environ.get("MP_API_KEY") or SETTINGS.get("PMG_MAPI_KEY")), "requires MP_API key to be set."
  )
  def test_download_compounds_with_mp_api(self):
-     save_mp_dir = "data/binary/mp_data"
+     self.cleanup_dirs.append(self.test_data_dir)
      
      try:
          download_compounds_with_mp_api.download_mp_data(
              mp_api_key=os.environ.get("MP_API_KEY"),
              num_elements=2,
              max_stoich=1,
-             save_dir=save_mp_dir,
+             save_dir=self.test_data_dir,
          )
      except Exception as e:
          self.fail(f"API download failed: {str(e)}")

      # Check if the data was downloaded
-     self.assertTrue(os.path.exists(save_mp_dir))
-     self.assertTrue(len(os.listdir(save_mp_dir)) > 0)
+     self.assertTrue(os.path.exists(self.test_data_dir))
+     downloaded_files = os.listdir(self.test_data_dir)
+     self.assertTrue(len(downloaded_files) > 0)
+     
+     # Verify file structure and content
+     for filename in downloaded_files:
+         filepath = os.path.join(self.test_data_dir, filename)
+         self.assertTrue(os.path.getsize(filepath) > 0)
-
-     # Clean up
-     shutil.rmtree(save_mp_dir)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 77338da and edc6a50.

📒 Files selected for processing (1)
  • smact/tests/test_utils.py (1 hunks)

Comment on lines +115 to +117
@unittest.skipUnless(
(os.environ.get("MP_API_KEY") or SETTINGS.get("PMG_MAPI_KEY")), "requires MP_API key to be set."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

CI configuration needs attention for Materials Project API tests

The CI workflow is missing the required MP_API_KEY secret configuration, though the crystal_space dependencies are properly installed via pip install -e ".[crystal_space,dev]". To ensure proper test execution:

  • Add the MP_API_KEY secret to the repository's GitHub Actions secrets
  • Update the CI workflow to include the MP_API_KEY environment variable in the test step:
- name: Run tests and collect coverage
  env:
    MP_API_KEY: ${{ secrets.MP_API_KEY }}
  run: python -m pytest --cov=smact --cov-report=xml -v
🔗 Analysis chain

Verify CI environment configuration

The test requires the Materials Project API key. Ensure that:

  1. The CI workflow has the MP_API_KEY secret configured
  2. The crystal_space optional dependencies are properly installed in CI
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if MP_API_KEY is configured in GitHub secrets
gh api repos/:owner/:repo/actions/secrets | jq '.secrets[] | select(.name=="MP_API_KEY")'

# Verify crystal_space dependencies in pyproject.toml
rg -A 5 '\[crystal_space\]' pyproject.toml

Length of output: 248


Script:

#!/bin/bash
# Check CI workflow files for API key configuration
fd -e yml -e yaml . .github/workflows/ --exec cat {}

# Check test dependencies in pyproject.toml
cat pyproject.toml

# Check if there are any environment setup scripts
fd -e sh -e bash . .github/workflows/ --exec cat {}

Length of output: 16636

Copy link

codecov bot commented Nov 21, 2024

Codecov Report

Attention: Patch coverage is 88.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 73.74%. Comparing base (bef9c5f) to head (edc6a50).
Report is 9 commits behind head on master.

Files with missing lines Patch % Lines
smact/builder.py 50.00% 1 Missing ⚠️
smact/structure_prediction/mutation.py 50.00% 1 Missing ⚠️
...s/crystal_space/generate_composition_with_smact.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #331      +/-   ##
==========================================
+ Coverage   73.56%   73.74%   +0.17%     
==========================================
  Files          31       31              
  Lines        2531     2544      +13     
==========================================
+ Hits         1862     1876      +14     
+ Misses        669      668       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@AntObi AntObi merged commit 1bac2b1 into master Nov 21, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant