Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate authors #11

Merged
merged 3 commits into from
Jan 1, 2025
Merged

Duplicate authors #11

merged 3 commits into from
Jan 1, 2025

Conversation

mmaelicke
Copy link
Member

@mmaelicke mmaelicke commented Jan 1, 2025

Summary by CodeRabbit

  • New Features

    • Enhanced author management with new search and retrieval capabilities
    • Added ability to retrieve authors by name or ID
    • Introduced more flexible author creation with duplicate handling options
    • New API endpoints for author management
  • Improvements

    • Updated entry and author creation processes
    • Improved search functionality for authors
    • Added more granular control over author duplicates

Copy link
Contributor

coderabbitai bot commented Jan 1, 2025

Walkthrough

This pull request introduces enhanced author management capabilities in the MetaCatalog API. The changes span multiple files in the metacatalog_api package, focusing on improving author-related functionality. Key modifications include adding new functions for retrieving and creating authors, updating function signatures to support more flexible author searches, and introducing parameters to control author duplication handling during entry creation.

Changes

File Change Summary
metacatalog_api/core.py - Added name parameter to authors function
- Introduced new author function for flexible author retrieval
- Added add_author function with no_duplicates option
- Updated add_entry with author_duplicates parameter
metacatalog_api/db.py - Added get_authors_by_name function
- Added get_author_by_name function
- Added create_or_get_author function
- Added add_author function
- Updated add_entry with author_duplicates parameter
metacatalog_api/router/api/create.py - Updated add_entry with author_duplicates parameter
- Added new add_author endpoint
metacatalog_api/router/api/read.py - Added get_author function
- Added get_author_by_name function
metacatalog_api/sql/authors_by_name.sql - Introduced CTE for flexible author name searching

Sequence Diagram

sequenceDiagram
    participant Client
    participant Router
    participant Core
    participant Database
    
    Client->>Router: Request to add/retrieve author
    Router->>Core: Call core method
    Core->>Database: Query or create author
    Database-->>Core: Return author details
    Core-->>Router: Return processed result
    Router-->>Client: Respond with author information
Loading

Possibly Related PRs

  • Improve search #9: Similar modifications to the add_entry function, focusing on author duplication handling

Poem

🐰 Hop, hop, through the data's maze,
Authors dancing in code's bright rays,
Duplicates tamed, names now clear,
MetaCatalog's magic drawing near!
A rabbit's tale of API delight 🔍


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@mmaelicke mmaelicke merged commit fcb5ffd into main Jan 1, 2025
1 of 2 checks passed
@mmaelicke mmaelicke deleted the duplicate_authors branch January 1, 2025 09:26
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (7)
metacatalog_api/db.py (3)

238-246: Raise a more specific exception.
In get_author_by_name, a RuntimeError is raised when multiple authors are found. Consider raising a more domain-specific exception like ValueError or a custom exception to improve error clarity and unify your codebase’s exception handling.

-        raise RuntimeError(f"More than one author found for name {name}")
+        raise ValueError(f"More than one author found for name '{name}'")

247-260: Potential naming confusion for local variable author.
Inside create_or_get_author, the incoming parameter and the local variable author are both used, but then author is redeclared with the return value of session.exec(...). This might be confusing and cause potential shadowing. Consider renaming one of them to avoid confusion.


376-377: Use descriptive condition comment.
When not author_duplicates, you rely on create_or_get_author. Consider clarifying in a comment or docstring that False means “avoid creating exact duplicates.”

-    elif not author_duplicates:
+    # If duplicates are not allowed, try to get or create the same author
     author = create_or_get_author(session, payload.author)
metacatalog_api/router/api/read.py (1)

74-83: Validate endpoint response.
get_author_by_name correctly checks if at least one of id, name, or search is provided; otherwise, it raises an HTTP 400. This check is good for request validation. Also, consider returning a 404 if an author is not found by name or search.

metacatalog_api/core.py (3)

116-123: Consider separate specialized queries.
When authors(...) is called, you branch logic based on id, entry_id, etc. The approach is flexible but can become cumbersome if many conditions grow. Evaluate specialized functions or data classes to keep this code straightforward.


131-144: Use ternary assignment to streamline logic.
As hinted by static analysis, this block sets author = None if no authors are found, else assigns the first. A ternary expression might reduce code lines but is optional.

-            if len(authors) == 0:
-                author = None
-            else:
-                author = authors[0]
+            author = None if len(authors) == 0 else authors[0]
🧰 Tools
🪛 Ruff (0.8.2)

139-142: Use ternary operator author = None if len(authors) == 0 else authors[0] instead of if-else-block

Replace if-else-block with author = None if len(authors) == 0 else authors[0]

(SIM108)


164-172: Document the no_duplicates parameter.
In add_author, clarify the difference in behavior between no_duplicates=True and no_duplicates=False in docstrings or method documentation for a better developer experience.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00290c7 and fcb5ffd.

📒 Files selected for processing (5)
  • metacatalog_api/core.py (2 hunks)
  • metacatalog_api/db.py (4 hunks)
  • metacatalog_api/router/api/create.py (1 hunks)
  • metacatalog_api/router/api/read.py (2 hunks)
  • metacatalog_api/sql/authors_by_name.sql (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
metacatalog_api/core.py

139-142: Use ternary operator author = None if len(authors) == 0 else authors[0] instead of if-else-block

Replace if-else-block with author = None if len(authors) == 0 else authors[0]

(SIM108)

🔇 Additional comments (11)
metacatalog_api/db.py (5)

222-222: No changes identified.
This blank line does not introduce any changes or side effects.


223-237: Implement wildcard searches consistently.
This new function get_authors_by_name correctly handles wildcard searches for both partial matches (using %) and explicit wildcard usage (using *). The approach is good for flexible lookups. Ensure that related endpoints and UI forms properly escape or sanitize wildcard inputs to avoid potential SQLlike injection or unintended pattern expansions.


262-269: Check for existing author conflicts.
The add_author function directly adds the author without verifying potential duplicates when no_duplicates is False. This is intentional per your design, but be mindful of potential user confusion or unintended duplication.


365-365: Leverage typing in function signature.
The new parameter author_duplicates: bool = False clarifies usage. Make sure relevant docstrings or openapi schemas reflect this parameter’s impact on how authors are processed.


389-390: Apply the same logic to co-authors.
You are applying consistent logic to co-authors by calling create_or_get_author when duplicates are not allowed. This ensures uniform handling. Confirm that co-authors also benefit from the same business rules as primary authors.

metacatalog_api/router/api/create.py (2)

9-10: Ensure error handling for missing fields.
When add_entry is called, ensure payload is validated so that required fields (e.g., author, title) are present or defaulted. This helps guard against partial or invalid data.


19-26: Clarify duplication logic.
The add_author endpoint uses no_duplicates: bool = True by default. Ensure the endpoint documentation highlights how setting no_duplicates to False can lead to multiple nearly identical author records.

metacatalog_api/router/api/read.py (1)

62-63: No meaningful content to review.
These lines are blank.

metacatalog_api/core.py (3)

124-125: Performance note with wildcard usage.
get_authors_by_name is invoked here if name is given. Repeated wildcard usage may degrade performance for large datasets if the underlying SQL query does not use indexes effectively. Monitor performance at scale.


173-173: Add docstring for author duplication.
add_entry now includes author_duplicates. An explicit docstring or function comment clarifying what “duplicates” means in context would improve maintainability.


176-176: Handle potential errors from db.add_datasource.
If payload.datasource is partially defined or references an invalid type, an exception could propagate. Wrap or check input validity for smoother user experience and robust error handling.

Comment on lines +1 to +12
WITH filtered_persons AS (
SELECT * FROM persons
WHERE is_organisation=true AND
organisation_name LIKE :prompt OR organisation_abbrev LIKE :prompt
UNION
SELECT * FROM persons
WHERE is_organisation=false AND
first_name || ' ' || last_name LIKE :prompt
)
SELECT * FROM filtered_persons
{limit} {offset}
;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure correct operator precedence.
In your WHERE clauses, ensure parentheses are used for disjunctions. Without them, organisation_name LIKE :prompt OR organisation_abbrev LIKE :prompt might not combine with the is_organisation=true condition as intended.

-    WHERE is_organisation=true AND 
-    organisation_name LIKE :prompt OR organisation_abbrev LIKE :prompt
+    WHERE is_organisation=true
+      AND (organisation_name LIKE :prompt OR organisation_abbrev LIKE :prompt)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
WITH filtered_persons AS (
SELECT * FROM persons
WHERE is_organisation=true AND
organisation_name LIKE :prompt OR organisation_abbrev LIKE :prompt
UNION
SELECT * FROM persons
WHERE is_organisation=false AND
first_name || ' ' || last_name LIKE :prompt
)
SELECT * FROM filtered_persons
{limit} {offset}
;
WITH filtered_persons AS (
SELECT * FROM persons
WHERE is_organisation=true
AND (organisation_name LIKE :prompt OR organisation_abbrev LIKE :prompt)
UNION
SELECT * FROM persons
WHERE is_organisation=false AND
first_name || ' ' || last_name LIKE :prompt
)
SELECT * FROM filtered_persons
{limit} {offset}
;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant