-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Crystal Space and Oxidation Utilities to the Documentation #337
Conversation
WalkthroughThe pull request introduces several changes across multiple documentation files and Python modules. Key modifications include updates to the Sphinx configuration in Changes
Possibly related PRs
Suggested labels
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🔇 Additional comments (2).readthedocs.yaml (2)
The addition of ✅ Verification successfulBoth requirements files serve distinct purposes and should be kept separate The analysis of both files shows they serve different purposes with minimal overlap:
The only shared dependencies are basic utilities (attrs, typing-extensions, etc.) which are common across Python projects and don't pose version conflicts. The addition of
🏁 Scripts executedThe following scripts were executed for the analysis: Script: #!/bin/bash
# Description: Check for duplicate dependencies and potential conflicts
# Compare requirements files for duplicates
echo "Checking for duplicate packages:"
comm -12 <(sort requirements.txt) <(sort docs/requirements.txt)
# Display both files for manual review
echo -e "\nContents of requirements.txt:"
cat requirements.txt
echo -e "\nContents of docs/requirements.txt:"
cat docs/requirements.txt
Length of output: 11487
The inclusion of additional requirements might impact build time and memory usage. Consider monitoring the ReadTheDocs build logs after this change. Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (10)
requirements.txt (1)
Line range hint
1-375
: Consider documenting direct dependencies separatelyThe requirements.txt file contains both direct and transitive dependencies. Consider maintaining a separate requirements-direct.txt file listing only the direct dependencies of the project for better maintainability.
Create a new file
requirements-direct.txt
with only the direct dependencies (those marked with "via SMACT (setup.py)"), then generate requirements.txt from it:+ # Create requirements-direct.txt with: + ase==3.23.0 + dash==2.18.2 + elementembeddings==0.6.1 + # ... other direct dependencies + + # Then generate requirements.txt using: + pip-compile --extra=strict requirements-direct.txt -o requirements.txtdocs/examples/doper.ipynb (3)
14-15
: Consider using proper markdown heading instead of bold textFor better document structure and navigation, consider using a proper markdown heading (e.g.,
## Output Format
) instead of bold text. This would maintain consistent heading hierarchy and improve the notebook's table of contents generation.-**Output Format** +## Output Format
Line range hint
1-24
: Consider adding error handling examplesThe documentation would benefit from examples demonstrating:
- How to handle invalid ionic species
- Error cases when using incompatible oxidation states
- Edge cases with unusual dopant combinations
This would help users understand the module's behaviour in non-ideal scenarios.
Line range hint
8-9
: Enhance the alternative metrics documentationThe skipspecies metric explanation could be improved by:
- Adding a brief comparison of accuracy between traditional and skipspecies approaches
- Including performance considerations
- Providing guidance on when to use each metric
This would help users make informed decisions about which metric to use for their specific use case.
smact/utils/crystal_space/download_compounds_with_mp_api.py (2)
Line range hint
16-24
: Add type hints to improve code clarity.Consider adding type hints to the function signature for better code maintainability and IDE support.
def download_mp_data( - mp_api_key: str | None = None, - num_elements: int = 2, - max_stoich: int = 8, - save_dir: str = "data/binary/mp_api", - request_interval: float = 0.1, + mp_api_key: str | None = None, + num_elements: int = 2, + max_stoich: int = 8, + save_dir: str = "data/binary/mp_api", + request_interval: float = 0.1, +) -> None:
Line range hint
89-89
: Consider implementing exponential backoff for API requests.The current implementation uses a fixed interval between requests. Consider implementing exponential backoff to better handle potential API rate limits.
- time.sleep(request_interval) + # Implement exponential backoff + retry_count = 0 + while retry_count < 3: + try: + time.sleep(request_interval * (2 ** retry_count)) + break + except Exception as e: + retry_count += 1 + if retry_count == 3: + raise esmact/utils/crystal_space/generate_composition_with_smact.py (2)
Line range hint
77-86
: Consider using context manager for multiprocessing pools.Using a context manager with multiprocessing pools ensures proper cleanup of resources.
- pool = multiprocessing.Pool(processes=multiprocessing.cpu_count() if num_processes is None else num_processes) - compounds = list( - tqdm( - pool.imap_unordered( - partial( - convert_formula, - num_elements=num_elements, - max_stoich=max_stoich, - ), - combinations, - ), - total=len(combinations), - ) - ) - pool.close() - pool.join() + with multiprocessing.Pool(processes=multiprocessing.cpu_count() if num_processes is None else num_processes) as pool: + compounds = list( + tqdm( + pool.imap_unordered( + partial( + convert_formula, + num_elements=num_elements, + max_stoich=max_stoich, + ), + combinations, + ), + total=len(combinations), + ) + )
Line range hint
115-115
: Remove redundantstrict=False
parameter in zip().The
strict
parameter is redundant in Python 3.10+ as it's the default behaviour.- symbols_stoich = zip(res[0], res[2], strict=False) + symbols_stoich = zip(res[0], res[2])docs/tutorials/crystal_space_visualisation.ipynb (2)
Line range hint
32-38
: Consider optimising the embedding computation for large datasets.The current implementation processes embeddings one at a time, which could be inefficient for large datasets. Consider vectorising the computation or processing in batches.
- embeddings = [] - for f in tqdm(formula): - try: - compositional_embedding = CompositionalEmbedding(f, embedding=embedding) - embeddings.append(compositional_embedding.feature_vector(stats=stats)) - except Exception as e: - embeddings.append(np.full(embedding_dim, np.nan)) + def process_batch(formulas, batch_size=100): + for i in range(0, len(formulas), batch_size): + batch = formulas[i:i + batch_size] + yield [ + CompositionalEmbedding(f, embedding=embedding).feature_vector(stats=stats) + if f else np.full(embedding_dim, np.nan) + for f in batch + ] + + embeddings = np.vstack(list(process_batch(formula)))Also applies to: 39-40, 41-42
Line range hint
148-149
: Consider adding error handling for specific exceptions.The dimension reduction implementation could benefit from more specific error handling to provide better feedback to users.
try: reducer = PCA(n_components=n_components, **kwargs) elif reducer == "tsne": reducer = TSNE(n_components=n_components, **kwargs) elif reducer == "umap": reducer = UMAP(n_components=n_components, **kwargs) else: raise ValueError("reducer must be one of ['pca', 'tsne', 'umap']") + try: + reduced_embeddings = reducer.fit_transform(embeddings.values) + except ValueError as e: + raise ValueError(f"Failed to reduce dimensions: {str(e)}") + except MemoryError: + raise MemoryError("Insufficient memory for dimension reduction. Consider reducing the dataset size.")Also applies to: 150-151, 152-153, 154-155, 156-157, 158-159
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (16)
docs/conf.py
(3 hunks)docs/examples.rst
(0 hunks)docs/examples/doper.ipynb
(1 hunks)docs/smact.utils.crystal_space.download_compounds_with_mp_api.rst
(1 hunks)docs/smact.utils.crystal_space.generate_composition_with_smact.rst
(1 hunks)docs/smact.utils.crystal_space.plot_embedding.rst
(1 hunks)docs/smact.utils.crystal_space.rst
(1 hunks)docs/smact.utils.oxidation.rst
(1 hunks)docs/smact.utils.rst
(1 hunks)docs/tutorials/crystal_space.ipynb
(2 hunks)docs/tutorials/crystal_space_visualisation.ipynb
(1 hunks)docs/tutorials/oxidation_states.ipynb
(1 hunks)requirements.txt
(1 hunks)smact/utils/crystal_space/download_compounds_with_mp_api.py
(1 hunks)smact/utils/crystal_space/generate_composition_with_smact.py
(3 hunks)smact/utils/oxidation.py
(1 hunks)
💤 Files with no reviewable changes (1)
- docs/examples.rst
✅ Files skipped from review due to trivial changes (7)
- docs/smact.utils.oxidation.rst
- docs/smact.utils.crystal_space.rst
- docs/smact.utils.crystal_space.download_compounds_with_mp_api.rst
- docs/smact.utils.crystal_space.plot_embedding.rst
- docs/smact.utils.crystal_space.generate_composition_with_smact.rst
- docs/smact.utils.rst
- docs/tutorials/oxidation_states.ipynb
🔇 Additional comments (10)
requirements.txt (1)
224-231
: Verify security advisories for pymatgen
The pymatgen package is a critical dependency. Given its recent version (2024.11.13), we should verify there are no known security issues.
✅ Verification successful
pymatgen 2024.11.13 is secure and up-to-date
The version 2024.11.13 of pymatgen being used is secure as it's well above the patched versions for all known vulnerabilities:
- Critical vulnerability (arbitrary code execution) was patched in 2024.2.20
- Moderate vulnerability (ReDoS) was patched in versions after 2022.9.21
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for security advisories for pymatgen
# Check PyPI for latest versions and potential security issues
curl -s https://pypi.org/pypi/pymatgen/json | jq '.info.version'
# Check for security advisories
gh api graphql -f query='
{
securityVulnerabilities(first: 5, ecosystem: PIP, package: "pymatgen") {
nodes {
advisory {
summary
severity
publishedAt
}
vulnerableVersionRange
firstPatchedVersion {
identifier
}
}
}
}'
Length of output: 949
docs/examples/doper.ipynb (1)
Line range hint 1-1
: Verify integration with Crystal Space functionality
Given that this PR aims to document Crystal Space functions, consider adding examples or references showing how the Doper module integrates with Crystal Space functionality, if applicable.
smact/utils/crystal_space/download_compounds_with_mp_api.py (1)
13-13
: LGTM! Import statement has been simplified.
The import statement has been updated to use a more concise path whilst maintaining the same functionality.
smact/utils/crystal_space/generate_composition_with_smact.py (2)
12-12
: LGTM! Import statement has been simplified.
The import statement has been updated to use a more concise path whilst maintaining the same functionality.
21-21
: LGTM! Return type hints added.
The addition of return type hints improves code clarity and IDE support.
Also applies to: 47-47
docs/tutorials/crystal_space.ipynb (1)
47-47
: LGTM! Documentation hyperlinks updated.
The hyperlinks to source files have been updated to reflect the new file structure, improving navigation and documentation clarity.
Also applies to: 130-130
docs/tutorials/crystal_space_visualisation.ipynb (1)
14-14
: LGTM: Prerequisite reference updated correctly.
The prerequisite reference has been updated from crystal_space_with_smact.ipynb
to crystal_space.ipynb
, which aligns with the PR objectives.
docs/conf.py (3)
48-48
: LGTM: Configuration updated to use modern Sphinx notebook execution setting.
The change from jupyter_execute_notebooks
to nb_execution_mode
aligns with current Sphinx-notebook practices.
421-423
: LGTM: Mock modules added for documentation build.
The addition of mp_api
, mp_api.client
, and emmet
to MOCK_MODULES
is appropriate for handling external dependencies during documentation builds.
157-157
: Verify the impact of commenting out static path.
While commenting out html_static_path
is valid, ensure that no static assets (CSS, images, etc.) are affected.
✅ Verification successful
Commenting out html_static_path
has no impact on the codebase
The verification shows that:
- No
_static
directory exists in the repository - The only references to static assets in documentation are images in
dev_docs/design_docs/StructurePredictor.md
, which use a different path (../images/
) - The only occurrence of
static_path
is the commented line itself inconf.py
Therefore, commenting out the unused html_static_path
configuration is safe and won't affect any static assets.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any static assets that might be affected
# Test: Look for references to static assets in the documentation
rg -g "*.rst" -g "*.md" -g "*.ipynb" "_static"
Length of output: 46
Script:
#!/bin/bash
# Check if _static directory exists and list its contents
fd -t d "_static"
fd -t f -p "_static"
# Check for any static file references in HTML templates or Python files
rg -g "*.html" -g "*.py" "_static"
# Check for any other common static asset patterns in docs
rg -g "*.rst" -g "*.md" -g "*.ipynb" -e "\.\./\S*\.(png|jpg|jpeg|gif|css|js)" -e "\.\./static" -e "\./_static"
# Check sphinx configuration for other static path configurations
rg -g "*.py" "static_path"
Length of output: 474
@@ -112,15 +111,15 @@ def get_species_occurrences_df( | |||
|
|||
def write( | |||
self, | |||
filename: str | os.PathLike, | |||
filename: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider maintaining support for PathLike objects
The change to restrict the filename
parameter to only strings reduces API flexibility. Python's path handling typically accepts both string and PathLike objects for better interoperability.
Apply this diff to restore PathLike support:
- filename: str,
+ filename: str | os.PathLike,
And update the docstring:
- filename (str): The filename to write the filtered oxidation states list to.
+ filename (str | os.PathLike): The filename to write the filtered oxidation states list to.
Also applies to: 122-122
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #337 +/- ##
==========================================
- Coverage 77.45% 77.44% -0.01%
==========================================
Files 31 31
Lines 2590 2589 -1
==========================================
- Hits 2006 2005 -1
Misses 584 584 ☔ View full report in Codecov by Sentry. |
Documentation updates
Description
This PR addresses the following:
Type of change
How Has This Been Tested?
Test Configuration:
Reviewers
N/A
Checklist
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Documentation
Chores
requirements.txt
file, adding new dependencies and updating existing ones to ensure compatibility and performance..readthedocs.yaml
file to include dependencies fromrequirements.txt
.