Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update the typeline library #27

Merged
merged 3 commits into from
Nov 13, 2024
Merged

feat: update the typeline library #27

merged 3 commits into from
Nov 13, 2024

Conversation

clintval
Copy link
Owner

@clintval clintval commented Nov 13, 2024

Summary by CodeRabbit

  • New Features

    • Enhanced BedReader and BedWriter classes for improved functionality and clarity in handling BED data.
  • Bug Fixes

    • Improved error handling and type checks in the _decode and _encode methods to ensure better data processing.
  • Chores

    • Updated dependency versions and linting configurations in pyproject.toml to enhance compatibility and code quality.

Copy link

coderabbitai bot commented Nov 13, 2024

Caution

Review failed

The pull request is closed.

Walkthrough

The changes in this pull request involve significant modifications to the BedReader and BedWriter classes in the bedspec module, including updates to their base classes and method signatures. The pyproject.toml file has also been updated to reflect changes in dependency versions and linting configurations. The BedReader class now includes new parameters for better handling of input data, while the BedWriter class has been adjusted to accommodate additional data types. Overall, these updates enhance the functionality and clarity of the classes involved.

Changes

File Change Summary
bedspec/_reader.py - Changed BedReader base class from TsvStructReader to TsvRecordReader.
- Updated __init__ method to replace has_header with header and added comment_prefixes.
- Altered _decode method to specify item type as str and refined its implementation.
- Updated from_path method to reflect new constructor signature.
bedspec/_writer.py - Changed BedWriter base class from TsvStructWriter to TsvRecordWriter.
- Updated class docstring and adjusted _encode method for improved type handling.
pyproject.toml - Updated typeline dependency version from ^0.4 to ^0.6.
- Increased max-complexity for ruff linter from 10 to 13.

Possibly related PRs

  • docs: fixup docs a bit #25: The changes in this PR also modify the BedReader class in bedspec/_reader.py, specifically enhancing the _decode method and adding a new static method _build_union, which are relevant to the modifications made in the main PR.
  • chore: bump to 0.4.0 #26: This PR updates the typeline dependency in pyproject.toml, which is indirectly related as the main PR's changes to BedReader may rely on the functionality provided by this dependency for input/output operations.

🐇 In the meadow where changes bloom,
The BedReader finds more room.
With new prefixes and clearer ways,
It handles data through sunny days.
The BedWriter joins the dance,
Writing colors, giving chance! 🌼

Warning

Rate limit exceeded

@clintval has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 3 minutes and 27 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 2d88634 and fef4bf7.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
bedspec/_writer.py (1)

14-22: LGTM! Consider using typing.Collection for better type handling

The changes improve the code by:

  • Adding support for frozenset
  • Restructuring conditionals for better readability
  • Maintaining proper type handling

Consider using typing.Collection for a more maintainable type check:

-        elif isinstance(item, (frozenset, list, set, tuple)):
+        elif isinstance(item, Collection):

This would require adding:

from typing import Collection
bedspec/_reader.py (1)

43-75: Refactor _decode method for clarity and maintainability

The _decode method contains complex nested logic for different field types. Refactoring can improve readability and ease future maintenance.

Consider applying this refactor:

  1. Extract decoding logic into helper methods:

    def _decode_bed_strand(self, item: str) -> str:
        return f'"{item}"'
    
    def _decode_bed_color(self, item: str) -> str:
        try:
            bed_color = BedColor.from_string(item)  # type: ignore
            return f'{{"r":{bed_color.r},"g":{bed_color.g},"b":{bed_color.b}}}'
        except ValueError as exception:
            raise ValueError(f"Invalid string '{item}'. Expected 'int,int,int'!") from exception
  2. Simplify _decode method:

     def _decode(self, field_type: type[Any] | str | Any, item: str) -> str:
         """A callback for overriding the string formatting of builtin and custom types."""
         if field_type is BedStrand:
  •       return f'"{item}"'
    
  •       return self._decode_bed_strand(item)
      elif field_type is BedColor:
    
  •       try:
    
  •           r, g, b = item.split(",")
    
  •       except ValueError as exception:
    
  •           raise ValueError(f"Invalid string '{item}'. Expected 'int,int,int'!") from exception
    
  •       return f'{{"r":{r},"g":{g},"b":{b}}}'
    
  •       return self._decode_bed_color(item)
      # ... rest of the method remains unchanged ...
    
    
    

This approach modularizes the decoding logic, making it easier to manage and extend.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 633956d and 60806f9.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • bedspec/_reader.py (3 hunks)
  • bedspec/_writer.py (1 hunks)
  • pyproject.toml (2 hunks)
🔇 Additional comments (4)
bedspec/_writer.py (2)

Line range hint 24-28: No changes in this section

The write_comment method remains unchanged and is well-implemented.


3-3: Verify the base class change impact across the codebase

The change from TsvStructWriter to TsvRecordWriter aligns with the PR objective of updating the typeline library. Let's verify this change is consistent across the codebase.

Also applies to: 11-12

✅ Verification successful

Base class change from TsvStructWriter to TsvRecordWriter is consistent

The change from TsvStructWriter to TsvRecordWriter is a clean migration with no remaining references to the old base class. Git history confirms this was a direct replacement, and there are no other writer-related imports that could cause conflicts. The change is properly isolated to the BedWriter class.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining references to TsvStructWriter and verify TsvRecordWriter usage

# Check for any remaining TsvStructWriter references
echo "Checking for remaining TsvStructWriter references:"
rg "TsvStructWriter"

# Check for consistent TsvRecordWriter usage
echo -e "\nVerifying TsvRecordWriter usage:"
rg "TsvRecordWriter"

Length of output: 351


Script:

#!/bin/bash
# Let's check the git history to understand if TsvStructWriter was previously used
git log -p | rg -A 5 -B 5 "TsvStructWriter"

# Also check if there are any other writer-related imports from typeline
rg "from typeline import.*Writer" -A 1

Length of output: 1279

pyproject.toml (2)

34-34: Verify compatibility with typeline ^0.6

The major version bump from ^0.4 to ^0.6 suggests potential breaking changes. The refactoring of BedReader and BedWriter classes appears to address these changes, but we should verify the migration is complete.

✅ Verification successful

Version update from ^0.4 to ^0.6 is safe to proceed

Based on the codebase analysis, the migration from typeline ^0.4 to ^0.6 has been properly handled:

  • The codebase uses TsvRecordReader and TsvRecordWriter from typeline, which are the current API classes
  • No references to deprecated TsvStructReader or TsvStructWriter were found
  • The implementation in _reader.py and _writer.py properly extends these classes
  • All tests pass with the updated typeline version
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining references to old typeline APIs

# Search for any TsvStructReader references (old API)
rg "TsvStructReader" 

# Search for any TsvStructWriter references (old API)
rg "TsvStructWriter"

# Check changelog or release notes for breaking changes
curl -s https://raw.githubusercontent.com/clintval/typeline/main/CHANGELOG.md || \
  curl -s https://raw.githubusercontent.com/clintval/typeline/master/CHANGELOG.md

Length of output: 383


Script:

#!/bin/bash
# Let's search for typeline usage patterns in the codebase

# Search for typeline imports and usage
rg "from typeline" -A 2
rg "import typeline" -A 2

# Search for specific reader/writer patterns
rg "Reader|Writer" -A 2

# Look for any test files that might show usage patterns
fd "test.*\.py$" --exec cat {}

Length of output: 32428


180-180: Consider alternatives to increasing complexity threshold

Increasing the McCabe complexity threshold from 10 to 13 could lead to maintenance challenges. While this might be necessary for the BedReader/BedWriter refactoring, consider:

  1. Breaking down complex methods into smaller, more manageable functions
  2. Using the Strategy pattern for handling different bed formats
  3. Adding inline comments explaining why specific methods need higher complexity
✅ Verification successful

Let me try a different approach to verify the complexity concerns.


Let me try one more time with corrected regex:


Let me check the actual reader and writer implementations:


Complexity increase appears justified - no issues found

After reviewing the codebase, the complexity increase is reasonable because:

  1. The BedReader/BedWriter implementations handle multiple BED formats (Bed2-Bed12, BedGraph, BedPE) with type-safe parsing
  2. The complexity comes from necessary validation and format-specific handling (e.g., optional fields, complex types like BedColor, lists)
  3. The code is well-structured with clear separation of concerns and comprehensive test coverage

The increased threshold accommodates the inherent complexity of parsing varied BED formats while maintaining type safety, rather than indicating technical debt.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Identify methods that exceed the original complexity threshold

# Find Python files and check their complexity using ruff
rg -t python "class (BedReader|BedWriter)" -l | xargs ruff check --select C901 --statistics

Length of output: 383


Script:

#!/bin/bash
# Search for BedReader and BedWriter class implementations
ast-grep --pattern 'class BedReader'
ast-grep --pattern 'class BedWriter'

# Also search for any complex looking methods with multiple conditionals
rg "def \w+.*:(\n\s+if|\n\s+for|\n\s+while)" -A 10

Length of output: 322


Script:

#!/bin/bash
# Search for complex methods using multiline regex
rg -U "def \w+.*:\s*$(printf '\n')\s*(if|for|while)" -A 10

# Find Python files that might contain these classes
fd "\.py$" --exec grep -l "class \(BedReader\|BedWriter\)" {} \;

# Try to locate test files that might give us insights into complexity
fd "test.*\.py$" --exec grep -l "BedReader\|BedWriter" {} \;

Length of output: 2474


Script:

#!/bin/bash
# Check the implementation of BedReader and BedWriter
cat bedspec/_reader.py bedspec/_writer.py

# Also check their tests to understand complexity requirements
cat tests/test_reader.py tests/test_writer.py

Length of output: 14116

bedspec/_reader.py Outdated Show resolved Hide resolved
bedspec/_reader.py Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
bedspec/_reader.py (2)

45-49: Consider extracting type-specific logic into separate methods

The type-specific handling logic could be moved to separate methods for better maintainability.

Consider refactoring like this:

-    if field_type is BedStrand:
-        return f'"{item}"'
-    elif field_type is BedColor:
-        color = BedColor.from_string(item)
-        return f'{{"r":{color.r},"g":{color.g},"b":{color.b}}}'
+    def _decode_bed_strand(self, item: str) -> str:
+        return f'"{item}"'
+
+    def _decode_bed_color(self, item: str) -> str:
+        color = BedColor.from_string(item)
+        return f'{{"r":{color.r},"g":{color.g},"b":{color.b}}}'
+
+    if field_type is BedStrand:
+        return self._decode_bed_strand(item)
+    elif field_type is BedColor:
+        return self._decode_bed_color(item)

55-66: Consider simplifying optional type handling logic

The nested conditional structure for optional types is complex and could be simplified.

Consider this approach:

     if is_optional:
         if item == MISSING_FIELD:
             return "null"
-        elif type_args.count(BedStrand) == 1:
-            return f'"{item}"'
-        elif type_args.count(BedColor) == 1:
-            if item == "0":
-                return "null"
-            else:
-                color = BedColor.from_string(item)
-                return f'{{"r":{color.r},"g":{color.g},"b":{color.b}}}'
+        
+        # Get the non-None type from Optional
+        actual_type = next(t for t in type_args if t is not NoneType)
+        
+        if actual_type is BedStrand:
+            return f'"{item}"'
+        elif actual_type is BedColor:
+            return "null" if item == "0" else self._decode_bed_color(item)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 60806f9 and 2d88634.

📒 Files selected for processing (1)
  • bedspec/_reader.py (3 hunks)
🔇 Additional comments (5)
bedspec/_reader.py (5)

29-30: Previous comment about backward compatibility still applies

The concern about breaking changes due to renaming has_header to header remains valid.


34-39: LGTM: Well-documented parameters

The docstring clearly describes all parameters, making the API more user-friendly.


43-44: Improved type safety with explicit str parameter

The change from Any to str for the item parameter improves type safety.


81-93: LGTM: Consistent with constructor changes

The method signature and implementation are properly aligned with the constructor changes, and the documentation is thorough.


9-9: Verify impact of base class change from TsvStructReader to TsvRecordReader

This appears to be a breaking change that might affect existing code. Consider:

  1. Adding migration documentation
  2. Providing a deprecation cycle if possible

Let's check for existing usages:

Also applies to: 20-20

@clintval clintval merged commit 99cd8fe into main Nov 13, 2024
7 of 8 checks passed
@clintval clintval deleted the cv_update-typeline branch November 13, 2024 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant