Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated oxidation.py and its tests #366

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from

Conversation

KingaMas
Copy link
Contributor

@KingaMas KingaMas commented Jan 14, 2025

Pull Request Template

Description

Updated get_species_occurences_df by adding a consensus_threshold. Helps to avoid non-physical species. Default is 3.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • [ x ] New feature (non-breaking change which adds functionality)

Reviewers

@AntObi

Checklist

  • [ x] My code follows the style guidelines of this project
  • [ x] I have performed a self-review of my own code
  • [ x] I have commented my code, particularly in hard-to-understand areas
  • [ x] I have made corresponding changes to the documentation
  • [ x] My changes generate no new warnings
  • [ x] I have added tests that prove my fix is effective or that my feature works
  • [ x] New and existing unit tests pass locally with my changes
  • [ x] Any dependent changes have been merged and published in downstream modules
  • [x ] I have checked my code and corrected any misspellings

Summary by CodeRabbit

Release Notes

  • Chores

    • Updated pre-commit hook configuration to run codespell during pre-commit phase.
    • Improved code formatting in composition generation function.
  • New Features

    • Enhanced species occurrence filtering with a new consensus threshold parameter.
    • Added element column to species occurrences DataFrame.
    • Introduced a new commonality parameter for filtering species based on occurrence proportions.
    • Added two new text files containing comprehensive lists of oxidation states for various chemical elements.

Copy link
Contributor

coderabbitai bot commented Jan 14, 2025

Walkthrough

This pull request encompasses changes across several files, primarily focusing on the configuration of pre-commit hooks and modifications to the ICSD24OxStatesFilter class. The .pre-commit-config.yaml file is updated to run the codespell hook during the pre-commit phase. The smact/utils/oxidation.py file sees significant changes to method signatures and filtering logic, introducing new parameters for species filtering. Additionally, new text files are added to provide oxidation state data, and minor formatting improvements are made in the generate_composition_with_smact.py file.

Changes

File Change Summary
.pre-commit-config.yaml Modified codespell hook stages from [commit, commit-msg] to [pre-commit, commit-msg]
smact/utils/oxidation.py - Updated filter, get_species_list, get_species_occurrences_df, and write methods to replace threshold with consensus and add commonality parameter.
- Adjusted filtering logic and return DataFrame structure.
smact/utils/crystal_space/generate_composition_with_smact.py Reformatted multiprocessing.Pool instantiation and partial function call for improved readability
smact/tests/files/oxidation_states_icsd24_consensus.txt New file added with a list of oxidation states filtered for low commonality.
smact/tests/files/oxidation_states_icsd24_consensus_w_0.txt New file added with a list of oxidation states including zero values.
smact/tests/test_utils.py Updated test methods to reflect changes in method signatures and file paths for oxidation states.

Possibly related PRs

  • Add oxidation states option to SMACT validity function #282: The changes in this PR introduce a new feature allowing users to select specific oxidation states in the smact_validity function, which relates to the changes in the main PR that modify how oxidation states are handled in the codebase.
  • Develop branch updates in preparation for new release #308: This PR includes significant modifications to the smact_validity function, particularly in how oxidation states are selected, which aligns with the changes in the main PR regarding the handling of oxidation states.
  • Structure prediction fixes #336: This PR updates the default oxidation states used in various functions, which is directly relevant to the changes made in the main PR regarding the handling of oxidation states.

Suggested labels

tests, feature, bug

Poem

🐰 Codespell hops through lines of code,
Pre-commit hooks now take a new road.
Oxidation states refined with care,
Formatting dance beyond compare!
A rabbit's leap of coding flair! 🔍

✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
smact/utils/oxidation.py (1)

100-101: Add validation for consensus_threshold parameter.

Consider adding validation to ensure the threshold is non-negative, as negative values wouldn't make sense in this context.

 species_occurrences_df = self.ox_states_df[
+    (consensus_threshold >= 0)
+    and
     (self.ox_states_df.results_count >= consensus_threshold)
 ].reset_index(drop=True)
get_species_occurences_df2.ipynb (1)

17-19: Add missing parameter documentation.

The docstring is missing documentation for the threshold parameter.

     Args:
+        threshold (int): Minimum number of occurrences required for inclusion. Default is 3.
         include_one_oxidation_state (bool): Include oxidation states +1 and -1 in the species or include as + and - signs. Default is False.
         sort_by_occurrences (bool): Sort the species list by occurrences. Default is True.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a92e123 and 78c3049.

📒 Files selected for processing (3)
  • .pre-commit-config.yaml (1 hunks)
  • get_species_occurences_df2.ipynb (1 hunks)
  • smact/utils/oxidation.py (2 hunks)
🧰 Additional context used
🪛 GitHub Actions: SMACT CI
get_species_occurences_df2.ipynb

[error] 2-7: Notebook contains execution count metadata that should be stripped. The nbstripout pre-commit hook failed and modified the file to remove execution counts.

🔇 Additional comments (3)
smact/utils/oxidation.py (2)

85-87: LGTM! Parameter addition looks good.

The new consensus_threshold parameter is well-documented and its default value of 3 seems reasonable for filtering out spurious data.

Also applies to: 94-94


Line range hint 85-101: Add tests for the new consensus_threshold parameter.

Please add unit tests to verify:

  1. Default threshold behaviour
  2. Custom threshold filtering
  3. Edge cases (0, very large values)
  4. Invalid inputs (negative values)

Would you like me to help generate comprehensive test cases for the new parameter?

.pre-commit-config.yaml (1)

55-55: LGTM! Good improvement to the pre-commit workflow.

Moving the codespell check to the pre-commit stage will help catch spelling issues earlier in the commit process.

get_species_occurences_df2.ipynb Outdated Show resolved Hide resolved
get_species_occurences_df2.ipynb Outdated Show resolved Hide resolved
Kinga Oliwia Mastej and others added 2 commits January 14, 2025 19:45
Copy link
Collaborator

@AntObi AntObi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests need to be updated to reflect the changes.
See my additional review comments

smact/utils/oxidation.py Outdated Show resolved Hide resolved
Comment on lines -107 to +109
species_occurrences_df = species_occurrences_df[["species", "results_count"]]
species_occurrences_df = species_occurrences_df[["element", "species", "results_count"]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to accept the change, but can this change also be documented in the Pull Request text. Would also be helpful to know why the addition of the element column is needed.

Additionally, the tests need to be updated to reflect the changes made to the code to ensure that they pass.

def test_oxidation_states_filter_species_occurrences(self):
species_occurrences_df = self.ox_filter.get_species_occurrences_df()
self.assertIsInstance(species_occurrences_df, pd.DataFrame)
self.assertEqual(
species_occurrences_df.columns.tolist(),
["species", "results_count"],
)
self.assertEqual(species_occurrences_df.shape, (490, 2))
self.assertEqual(species_occurrences_df.iloc[0]["species"], "O2-")
self.assertEqual(species_occurrences_df.iloc[0]["results_count"], 116910)

I'd say that consensus_threshold should be relabeled to threshold since there are functions which also take a threshold argument in this Python file. So it's better to have consistency. On the point of consistency, all thresholds should either be >= or >, rather than having a mix of both.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
smact/utils/oxidation.py (1)

Line range hint 130-154: Reduce code duplication by reusing filter method logic.

The filtering logic and species proportion calculation are duplicated from the filter method. Consider reusing the existing implementation.

-if not include_zero:
-    species_occurrences_df = self.ox_states_df[
-        self.ox_states_df["oxidation_state"] != 0
-    ].reset_index(drop=True)
-else:
-    species_occurrences_df = self.ox_states_df
-species_occurrences_df = species_occurrences_df[
-    (species_occurrences_df.results_count >= consensus)
-].reset_index(drop=True)
+# Reuse filter method with commonality="low" to maintain all proportions
+species_occurrences_df = self.ox_states_df[
+    self.ox_states_df["results_count"].isin(
+        self.filter(consensus=consensus, include_zero=include_zero, commonality="low")["results_count"]
+    )
+].reset_index(drop=True)
🧹 Nitpick comments (1)
smact/utils/oxidation.py (1)

196-198: Enhance metadata in output file.

Consider adding more detailed metadata about the filtering parameters used.

 f.write(
-    f"#\n# Oxidation state set\n# Source: ICSD (2024), filtered for {commonality} commonality of reports\n#\n"
+    f"#\n# Oxidation state set\n"
+    f"# Source: ICSD (2024)\n"
+    f"# Filtering parameters:\n"
+    f"#   - Minimum consensus: {consensus} reports\n"
+    f"#   - Commonality threshold: {commonality} "
+    f"({commonality_map[commonality]}% minimum proportion)\n#\n"
 )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ced5460 and 88f4c08.

📒 Files selected for processing (1)
  • smact/utils/oxidation.py (5 hunks)
🔇 Additional comments (3)
smact/utils/oxidation.py (3)

24-26: LGTM! Clean data loading implementation.

The DataFrame loading implementation is correct and follows pandas best practices.


79-83: LGTM! Consistent parameter updates.

The changes to get_species_list maintain consistency with the filter method updates.

Also applies to: 96-96


28-30: Consider renaming 'consensus' to 'threshold' for consistency.

As mentioned in previous reviews, other functions in this file use the term 'threshold'. For consistency, consider renaming the 'consensus' parameter to 'threshold'.

Comment on lines 53 to 58
element_totals = filtered_df.groupby("element")["results_count"].transform(
"sum"
)
filtered_df["species_proportion (%)"] = (
filtered_df["results_count"] / element_totals * 100
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for zero element totals in proportion calculation.

The species proportion calculation could raise a division by zero error if all results are filtered out for an element.

 element_totals = filtered_df.groupby("element")["results_count"].transform(
     "sum"
 )
+if (element_totals == 0).any():
+    raise ValueError(
+        "No valid results remain for some elements after filtering. "
+        "Consider adjusting the consensus threshold."
+    )
 filtered_df["species_proportion (%)"] = (
     filtered_df["results_count"] / element_totals * 100
 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
element_totals = filtered_df.groupby("element")["results_count"].transform(
"sum"
)
filtered_df["species_proportion (%)"] = (
filtered_df["results_count"] / element_totals * 100
)
element_totals = filtered_df.groupby("element")["results_count"].transform(
"sum"
)
if (element_totals == 0).any():
raise ValueError(
"No valid results remain for some elements after filtering. "
"Consider adjusting the consensus threshold."
)
filtered_df["species_proportion (%)"] = (
filtered_df["results_count"] / element_totals * 100
)

Comment on lines 41 to 42
commonality_map = {"low": 0, "medium": 10, "high": 50}
commonality_threshold = commonality_map.get(commonality)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add input validation for commonality parameter.

The commonality parameter should be validated against allowed values to prevent silent failures with invalid inputs.

 commonality_map = {"low": 0, "medium": 10, "high": 50}
+if commonality not in commonality_map:
+    raise ValueError(
+        f"Invalid commonality value: {commonality}. "
+        f"Must be one of: {', '.join(commonality_map.keys())}"
+    )
 commonality_threshold = commonality_map.get(commonality)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
commonality_map = {"low": 0, "medium": 10, "high": 50}
commonality_threshold = commonality_map.get(commonality)
commonality_map = {"low": 0, "medium": 10, "high": 50}
if commonality not in commonality_map:
raise ValueError(
f"Invalid commonality value: {commonality}. "
f"Must be one of: {', '.join(commonality_map.keys())}"
)
commonality_threshold = commonality_map.get(commonality)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 88f4c08 and 3a3c1d0.

📒 Files selected for processing (1)
  • smact/utils/oxidation.py (5 hunks)
🔇 Additional comments (5)
smact/utils/oxidation.py (5)

58-75: LGTM! Clear documentation and proper parameter propagation

The method signature changes and documentation updates are well-implemented.


Line range hint 135-169: LGTM! Consistent parameter updates and clear documentation

The method changes maintain consistency with other methods and provide clear documentation of the new functionality.


179-179: LGTM! Consistent threshold comparison

The change from > to >= aligns with the consensus threshold usage throughout the class.


37-38: ⚠️ Potential issue

Add input validation for the commonality parameter

The commonality parameter should be validated to prevent silent failures with invalid inputs.

 commonality_map = {"low": 0, "medium": 10, "high": 50}
+if commonality not in commonality_map:
+    raise ValueError(
+        f"Invalid commonality value: {commonality}. "
+        f"Must be one of: {', '.join(commonality_map.keys())}"
+    )
 commonality_threshold = commonality_map.get(commonality)

Likely invalid or redundant comment.


46-48: ⚠️ Potential issue

Add error handling for zero element totals

The species proportion calculation could raise a division by zero error if all results are filtered out for an element.

 element_totals = filtered_df.groupby("element")["results_count"].transform("sum")
+if (element_totals == 0).any():
+    raise ValueError(
+        "No valid results remain for some elements after filtering. "
+        "Consider adjusting the consensus threshold."
+    )
 filtered_df["species_proportion (%)"] = filtered_df["results_count"] / element_totals * 100

Likely invalid or redundant comment.

Comment on lines 92 to +96
self,
consensus: int = 3,
include_one_oxidation_state: bool = False,
sort_by_occurrences: bool = True,
include_zero: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add commonality parameter for consistency

This method should include the commonality parameter for consistency with other methods in the class.

 def get_species_occurrences_df(
     self,
     consensus: int = 3,
     include_one_oxidation_state: bool = False,
     sort_by_occurrences: bool = True,
-    include_zero: bool = False,
+    include_zero: bool = False,
+    commonality: str = "low",
 ):

Committable suggestion skipped: line range outside the PR's diff.

@@ -142,7 +164,9 @@ def write(
if not filename.endswith(".txt"):
filename += ".txt"
with open(filename, "w") as f:
f.write(f"#\n# Oxidation state set\n# Source: ICSD (2024), filtered for > {threshold} reports\n#\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest that users should have the option to either write out lists based on either the threshold/consensus or commonality. This should preserve backwards compatibility, as well as provide flexibility to users to choose how to write the oxidation states lists


Returns:
pd.DataFrame: The filtered oxidation states list as a DataFrame.
"""
commonality_map = {"low": 0, "medium": 10, "high": 50}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest that there's additional argument to the filter function for commonality_map which can take your default arguments.
This provides flexibility to the user if they want to define their own thresholds

@KingaMas KingaMas changed the title Updated get_species_occurences_df by adding a consensus_threshold Updated oxidation.py and its tests Jan 17, 2025
@KingaMas
Copy link
Contributor Author

Thanks @AntObi ! I've introduced your suggestions and updated the tests accordingly

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
smact/utils/oxidation.py (1)

55-61: ⚠️ Potential issue

Add error handling for zero element totals.

The species proportion calculation could raise a division by zero error if all results are filtered out for an element.

         element_totals = filtered_df.groupby("element")["results_count"].transform("sum")
+        if (element_totals == 0).any():
+            raise ValueError(
+                "No valid results remain for some elements after filtering. "
+                "Consider adjusting the consensus threshold."
+            )
         filtered_df["species_proportion (%)"] = filtered_df["results_count"] / element_totals * 100
🧹 Nitpick comments (5)
smact/utils/oxidation.py (2)

26-41: LGTM! Consider enhancing the docstring.

The parameter renaming and addition of commonality filtering improve the API. The docstring is comprehensive but could be enhanced by documenting the return value's columns.

         Returns:
-            pd.DataFrame: The filtered oxidation states list as a DataFrame.
+            pd.DataFrame: The filtered oxidation states list as a DataFrame with columns:
+                - element (str): Chemical element symbol
+                - oxidation_state (str): Space-separated oxidation states

179-181: Enhance the file header with consensus threshold information.

The header should mention both the commonality level and consensus threshold for complete filtering information.

             f.write(
-                f"#\n# Oxidation state set\n# Source: ICSD (2024), filtered for {commonality} commonality of reports\n#\n"
+                f"#\n# Oxidation state set\n# Source: ICSD (2024), filtered for {commonality} commonality of reports (consensus ≥ {consensus})\n#\n"
             )
smact/tests/test_utils.py (3)

181-181: Enhance test coverage for commonality parameter.

The test should verify the behaviour with different commonality values.

         threshold = 10
-        filtered_df = self.ox_filter.filter(consensus=threshold)
+        filtered_df = self.ox_filter.filter(consensus=threshold)
+        filtered_df_medium = self.ox_filter.filter(consensus=threshold, commonality="medium")
+        filtered_df_high = self.ox_filter.filter(consensus=threshold, commonality="high")
 
         self.assertIsInstance(filtered_df, pd.DataFrame)
+        self.assertIsInstance(filtered_df_medium, pd.DataFrame)
+        self.assertIsInstance(filtered_df_high, pd.DataFrame)
+        self.assertGreater(len(filtered_df), len(filtered_df_medium))
+        self.assertGreater(len(filtered_df_medium), len(filtered_df_high))

187-227: Enhance file content verification.

The tests could be more thorough in verifying the structure and content of the written files.

         with open(f"{filename}.txt") as f:
             content = f.read()
         # Check if the comment is included in the file
         self.assertIn(comment, content)
+        # Verify file structure
+        lines = content.split("\n")
+        self.assertTrue(any(line.startswith("# Source: ICSD (2024)") for line in lines))
+        self.assertTrue(all(len(line.split()) <= 2 for line in lines if not line.startswith("#") and line))
+        # Verify oxidation states format
+        for line in lines:
+            if line and not line.startswith("#"):
+                parts = line.split()
+                if len(parts) == 2:
+                    self.assertTrue(all(x.isdigit() or x == "-" for x in parts[1].split()))

229-261: Add tests for custom commonality thresholds.

The tests should verify that custom commonality thresholds work correctly.

         # Test with a specific consensus threshold
         species_list_threshold = self.ox_filter.get_species_list(consensus=5)
         self.assertIsInstance(species_list_threshold, list)
         self.assertGreater(len(species_list_threshold), 0)
+
+        # Test with custom commonality threshold
+        species_list_custom = self.ox_filter.get_species_list(
+            consensus=3,
+            commonality_thresholds={"low": 0, "medium": 15, "high": 60}
+        )
+        self.assertIsInstance(species_list_custom, list)
+        self.assertGreater(len(species_list_custom), 0)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3a3c1d0 and 5844fbd.

📒 Files selected for processing (4)
  • smact/tests/files/oxidation_states_icsd24_consensus.txt (1 hunks)
  • smact/tests/files/oxidation_states_icsd24_consensus_w_0.txt (1 hunks)
  • smact/tests/test_utils.py (2 hunks)
  • smact/utils/oxidation.py (5 hunks)
✅ Files skipped from review due to trivial changes (2)
  • smact/tests/files/oxidation_states_icsd24_consensus_w_0.txt
  • smact/tests/files/oxidation_states_icsd24_consensus.txt
🔇 Additional comments (4)
smact/utils/oxidation.py (3)

42-50: Make commonality thresholds configurable.

As suggested in a previous review, the commonality thresholds should be configurable to provide more flexibility to users.

     def filter(
         self,
         consensus: int = 3,
         include_zero: bool = False,
         commonality: str | float = "low",
+        commonality_thresholds: dict[str, float] | None = None,
     ):
-        commonality_map = {"low": 0, "medium": 10, "high": 50}
+        default_map = {"low": 0, "medium": 10, "high": 50}
+        commonality_map = commonality_thresholds or default_map

Line range hint 105-139: Add commonality parameter for consistency.

This method should include the commonality parameter for consistency with other methods in the class.

     def get_species_occurrences_df(
         self,
         consensus: int = 3,
         include_one_oxidation_state: bool = False,
         sort_by_occurrences: bool = True,
         include_zero: bool = False,
+        commonality: str = "low",
     ):

191-191: LGTM! Consistent threshold comparison.

The change to use >= makes the filtering logic consistent across all methods.

smact/tests/test_utils.py (1)

166-167: LGTM! Test paths updated correctly.

The test file paths have been appropriately updated to reflect the consensus-based filtering approach.

Copy link

codecov bot commented Jan 23, 2025

Codecov Report

Attention: Patch coverage is 93.84615% with 4 lines in your changes missing coverage. Please review.

Project coverage is 75.58%. Comparing base (a92e123) to head (5844fbd).

Files with missing lines Patch % Lines
smact/utils/oxidation.py 81.81% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #366      +/-   ##
==========================================
+ Coverage   75.47%   75.58%   +0.11%     
==========================================
  Files          31       31              
  Lines        2642     2679      +37     
==========================================
+ Hits         1994     2025      +31     
- Misses        648      654       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@KingaMas KingaMas requested a review from AntObi January 23, 2025 17:55
@AntObi AntObi changed the base branch from master to develop January 24, 2025 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants