-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate authors #11
Conversation
WalkthroughThis pull request introduces enhanced author management capabilities in the MetaCatalog API. The changes span multiple files in the Changes
Sequence DiagramsequenceDiagram
participant Client
participant Router
participant Core
participant Database
Client->>Router: Request to add/retrieve author
Router->>Core: Call core method
Core->>Database: Query or create author
Database-->>Core: Return author details
Core-->>Router: Return processed result
Router-->>Client: Respond with author information
Possibly Related PRs
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (7)
metacatalog_api/db.py (3)
238-246
: Raise a more specific exception.
Inget_author_by_name
, aRuntimeError
is raised when multiple authors are found. Consider raising a more domain-specific exception likeValueError
or a custom exception to improve error clarity and unify your codebase’s exception handling.- raise RuntimeError(f"More than one author found for name {name}") + raise ValueError(f"More than one author found for name '{name}'")
247-260
: Potential naming confusion for local variableauthor
.
Insidecreate_or_get_author
, the incoming parameter and the local variableauthor
are both used, but thenauthor
is redeclared with the return value ofsession.exec(...)
. This might be confusing and cause potential shadowing. Consider renaming one of them to avoid confusion.
376-377
: Use descriptive condition comment.
Whennot author_duplicates
, you rely oncreate_or_get_author
. Consider clarifying in a comment or docstring thatFalse
means “avoid creating exact duplicates.”- elif not author_duplicates: + # If duplicates are not allowed, try to get or create the same author author = create_or_get_author(session, payload.author)metacatalog_api/router/api/read.py (1)
74-83
: Validate endpoint response.
get_author_by_name
correctly checks if at least one ofid
,name
, orsearch
is provided; otherwise, it raises an HTTP 400. This check is good for request validation. Also, consider returning a 404 if an author is not found by name or search.metacatalog_api/core.py (3)
116-123
: Consider separate specialized queries.
Whenauthors(...)
is called, you branch logic based onid
,entry_id
, etc. The approach is flexible but can become cumbersome if many conditions grow. Evaluate specialized functions or data classes to keep this code straightforward.
131-144
: Use ternary assignment to streamline logic.
As hinted by static analysis, this block setsauthor = None
if no authors are found, else assigns the first. A ternary expression might reduce code lines but is optional.- if len(authors) == 0: - author = None - else: - author = authors[0] + author = None if len(authors) == 0 else authors[0]🧰 Tools
🪛 Ruff (0.8.2)
139-142: Use ternary operator
author = None if len(authors) == 0 else authors[0]
instead ofif
-else
-blockReplace
if
-else
-block withauthor = None if len(authors) == 0 else authors[0]
(SIM108)
164-172
: Document theno_duplicates
parameter.
Inadd_author
, clarify the difference in behavior betweenno_duplicates=True
andno_duplicates=False
in docstrings or method documentation for a better developer experience.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
metacatalog_api/core.py
(2 hunks)metacatalog_api/db.py
(4 hunks)metacatalog_api/router/api/create.py
(1 hunks)metacatalog_api/router/api/read.py
(2 hunks)metacatalog_api/sql/authors_by_name.sql
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
metacatalog_api/core.py
139-142: Use ternary operator author = None if len(authors) == 0 else authors[0]
instead of if
-else
-block
Replace if
-else
-block with author = None if len(authors) == 0 else authors[0]
(SIM108)
🔇 Additional comments (11)
metacatalog_api/db.py (5)
222-222
: No changes identified.
This blank line does not introduce any changes or side effects.
223-237
: Implement wildcard searches consistently.
This new function get_authors_by_name
correctly handles wildcard searches for both partial matches (using %
) and explicit wildcard usage (using *
). The approach is good for flexible lookups. Ensure that related endpoints and UI forms properly escape or sanitize wildcard inputs to avoid potential SQLlike injection or unintended pattern expansions.
262-269
: Check for existing author conflicts.
The add_author
function directly adds the author
without verifying potential duplicates when no_duplicates
is False
. This is intentional per your design, but be mindful of potential user confusion or unintended duplication.
365-365
: Leverage typing in function signature.
The new parameter author_duplicates: bool = False
clarifies usage. Make sure relevant docstrings or openapi schemas reflect this parameter’s impact on how authors are processed.
389-390
: Apply the same logic to co-authors.
You are applying consistent logic to co-authors by calling create_or_get_author
when duplicates are not allowed. This ensures uniform handling. Confirm that co-authors also benefit from the same business rules as primary authors.
metacatalog_api/router/api/create.py (2)
9-10
: Ensure error handling for missing fields.
When add_entry
is called, ensure payload
is validated so that required fields (e.g., author, title) are present or defaulted. This helps guard against partial or invalid data.
19-26
: Clarify duplication logic.
The add_author
endpoint uses no_duplicates: bool = True
by default. Ensure the endpoint documentation highlights how setting no_duplicates
to False
can lead to multiple nearly identical author records.
metacatalog_api/router/api/read.py (1)
62-63
: No meaningful content to review.
These lines are blank.
metacatalog_api/core.py (3)
124-125
: Performance note with wildcard usage.
get_authors_by_name
is invoked here if name
is given. Repeated wildcard usage may degrade performance for large datasets if the underlying SQL query does not use indexes effectively. Monitor performance at scale.
173-173
: Add docstring for author duplication.
add_entry
now includes author_duplicates
. An explicit docstring or function comment clarifying what “duplicates” means in context would improve maintainability.
176-176
: Handle potential errors from db.add_datasource
.
If payload.datasource
is partially defined or references an invalid type, an exception could propagate. Wrap or check input validity for smoother user experience and robust error handling.
WITH filtered_persons AS ( | ||
SELECT * FROM persons | ||
WHERE is_organisation=true AND | ||
organisation_name LIKE :prompt OR organisation_abbrev LIKE :prompt | ||
UNION | ||
SELECT * FROM persons | ||
WHERE is_organisation=false AND | ||
first_name || ' ' || last_name LIKE :prompt | ||
) | ||
SELECT * FROM filtered_persons | ||
{limit} {offset} | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure correct operator precedence.
In your WHERE
clauses, ensure parentheses are used for disjunctions. Without them, organisation_name LIKE :prompt OR organisation_abbrev LIKE :prompt
might not combine with the is_organisation=true
condition as intended.
- WHERE is_organisation=true AND
- organisation_name LIKE :prompt OR organisation_abbrev LIKE :prompt
+ WHERE is_organisation=true
+ AND (organisation_name LIKE :prompt OR organisation_abbrev LIKE :prompt)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
WITH filtered_persons AS ( | |
SELECT * FROM persons | |
WHERE is_organisation=true AND | |
organisation_name LIKE :prompt OR organisation_abbrev LIKE :prompt | |
UNION | |
SELECT * FROM persons | |
WHERE is_organisation=false AND | |
first_name || ' ' || last_name LIKE :prompt | |
) | |
SELECT * FROM filtered_persons | |
{limit} {offset} | |
; | |
WITH filtered_persons AS ( | |
SELECT * FROM persons | |
WHERE is_organisation=true | |
AND (organisation_name LIKE :prompt OR organisation_abbrev LIKE :prompt) | |
UNION | |
SELECT * FROM persons | |
WHERE is_organisation=false AND | |
first_name || ' ' || last_name LIKE :prompt | |
) | |
SELECT * FROM filtered_persons | |
{limit} {offset} | |
; |
Summary by CodeRabbit
New Features
Improvements