Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path updates #12

Merged
merged 8 commits into from
Jan 8, 2025
Merged

Path updates #12

merged 8 commits into from
Jan 8, 2025

Conversation

mmaelicke
Copy link
Member

@mmaelicke mmaelicke commented Jan 7, 2025

@AlexDo1 this PR includes the updates to manage Groups through the API.

It can already also add new groups in place, with a new Entry, without duplicating. You can append a list like

entry.groups = [2, 3, dict(title="New Group", description="New group added in place", type="Label")]

this will add associations for the new entry to the pre-existing groups of id 2 and 3 and create a new 'Label'-type group called 'New Group'.

Currently we don't have very useful group types beside the composite, as 'Label' is not very descriptive. You may suggest new types and also additional fields on a group beside a title and a description. There will also be a database migration changing the type of title and description from character varying to text, I would include your suggestions into that migration.

If you can't think of improvements right away, I can also merge and do the migration in an extra PR.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for managing entry groups.
    • Introduced new API endpoints for retrieving and creating groups.
    • Enhanced author management with new functionalities.
  • Improvements

    • Expanded database interaction layer with new query methods for authors and groups.
    • Updated data models to support more flexible entry and group relationships.
    • Added more granular filtering options for retrieving entries, authors, and groups.
  • API Enhancements

    • New endpoints for licenses, group types, and group-related operations.
    • Improved error handling for API requests.

Copy link
Contributor

coderabbitai bot commented Jan 7, 2025

Walkthrough

This pull request introduces comprehensive enhancements to the MetaCatalog API, focusing on expanding group management capabilities. The changes span multiple files in the metacatalog_api package, adding new functions for retrieving, creating, and managing entry groups. The modifications include updates to core API functions, database interaction methods, data models, and API router endpoints. The primary goal is to provide more robust group-related functionality, allowing users to create, retrieve, and associate entries with groups more effectively.

Changes

File Change Summary
metacatalog_api/core.py Added groups(), group_types(), add_entry(), add_group() functions; modified variables() and add_datasource() functions.
metacatalog_api/db.py Added multiple database functions for group and author management, including get_authors_by_name(), get_groups(), get_group(), add_group(), group_entries(), and others. Removed get_datasource_by_id.
metacatalog_api/models.py Introduced new group-related models like EntryGroupBase, EntryGroup, EntryGroupCreate, and removed id fields from TemporalScaleBase, SpatialScaleBase. Added relationships in EntryTable.
metacatalog_api/router/api/create.py Updated /author endpoint to /authors; added /groups endpoint for creating new groups.
metacatalog_api/router/api/read.py Added endpoints for retrieving licenses, authors, variables, group types, and groups; refined error handling in get_variables().
metacatalog_api/sql/maintain/defaults.sql Added new entries for 'Dataset' and 'Site' in the entrygroup_types table.
metacatalog_api/sql/maintain/install.sql Changed name and description columns in entrygroup_types table to TEXT.
metacatalog_api/sql/migrate/migration_3.sql Modified entrygroup_types table to enforce non-null constraints on name and description, added new entries.

Possibly Related PRs

  • App blueprints #10: Introduces a new router for creating groups and entries, which aligns closely with the group management functionality added in this PR.

Poem

🐰 Hop, hop, hooray for groups galore!
Our MetaCatalog's now got so much more
Entries dancing, grouped with glee
Metadata's magic, just wait and see!
A rabbit's dream of organization supreme 🌈


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (6)
metacatalog_api/router/api/read.py (4)

102-102: Enhance exception chaining by using from e

In the get_variable function, consider using exception chaining to preserve the original traceback and provide more context by adding from e when raising the HTTPException.

Apply this diff to enhance exception chaining:

 except Exception as e:
-    raise HTTPException(status_code=404, detail=str(e))
+    raise HTTPException(status_code=404, detail=str(e)) from e
🧰 Tools
🪛 Ruff (0.8.2)

102-102: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


111-111: Enhance exception chaining by using from e

In the get_group_types function, consider using exception chaining to preserve the original traceback and provide more context by adding from e when raising the HTTPException.

Apply this diff to enhance exception chaining:

 except Exception as e:
-    raise HTTPException(status_code=404, detail=str(e))
+    raise HTTPException(status_code=404, detail=str(e)) from e
🧰 Tools
🪛 Ruff (0.8.2)

111-111: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


121-121: Enhance exception chaining by using from e

In the get_groups function, consider using exception chaining to preserve the original traceback and provide more context by adding from e when raising the HTTPException.

Apply this diff to enhance exception chaining:

 except Exception as e:
-    raise HTTPException(status_code=500, detail=str(e))
+    raise HTTPException(status_code=500, detail=str(e)) from e
🧰 Tools
🪛 Ruff (0.8.2)

121-121: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


132-132: Enhance exception chaining by using from e

In the get_group function, consider using exception chaining to preserve the original traceback and provide more context by adding from e when raising the HTTPException.

Apply this diff to enhance exception chaining:

 except Exception as e:
-    raise HTTPException(status_code=500, detail=str(e))
+    raise HTTPException(status_code=500, detail=str(e)) from e
🧰 Tools
🪛 Ruff (0.8.2)

132-132: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

metacatalog_api/models.py (1)

362-362: Avoid using type as a field name to prevent shadowing

Using type as a field name may shadow the built-in type function in Python, potentially leading to confusion or bugs. Consider renaming the field to group_type or a similar descriptive name.

Apply this diff to rename the field:

 class EntryGroupCreate(EntryGroupBase):
-    type: str
+    group_type: str
metacatalog_api/router/api/create.py (1)

28-29: Use plural noun in endpoint path for consistency

For consistency with RESTful API design and existing endpoints (e.g., /entries, /authors), consider changing the endpoint path from /group to /groups.

Apply this diff to update the endpoint path:

-@create_router.post('/group')
+@create_router.post('/groups')
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fcb5ffd and b5bcf50.

📒 Files selected for processing (5)
  • metacatalog_api/core.py (5 hunks)
  • metacatalog_api/db.py (6 hunks)
  • metacatalog_api/models.py (3 hunks)
  • metacatalog_api/router/api/create.py (1 hunks)
  • metacatalog_api/router/api/read.py (3 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
metacatalog_api/router/api/read.py

102-102: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


111-111: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


121-121: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


132-132: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

metacatalog_api/db.py

540-540: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🔇 Additional comments (9)
metacatalog_api/models.py (1)

310-312: Verify compatibility of base classes for temporal_scale and spatial_scale

Changing temporal_scale and spatial_scale to use base classes in DatasourceCreate might impact validation and serialization. Ensure that this change is compatible with how these fields are used elsewhere in the application.

metacatalog_api/core.py (4)

106-114: LGTM! Well-structured group retrieval function.

The implementation provides flexible group retrieval with proper parameter handling and pagination support.


116-119: LGTM! Clean implementation.

The function provides a straightforward way to retrieve group types.


200-203: LGTM! Group association handling is well implemented.

The code correctly handles group associations during entry creation, supporting both existing groups by ID and new group creation.


214-223: LGTM! Clean group creation implementation.

The function provides a clear interface for creating new groups with optional entry associations.

metacatalog_api/db.py (4)

494-496: LGTM! Clean implementation.

The function provides a straightforward way to retrieve and map group types.


499-519: LGTM! Well-structured group retrieval implementation.

The function provides comprehensive filtering options with proper pagination support and optional metadata inclusion.


521-538: LGTM! Robust single group retrieval implementation.

The function includes proper parameter validation and flexible metadata handling.


563-584: LGTM! Robust group-entry association implementation.

The function handles both existing and new groups, with proper type checking and flexible input support.

Comment on lines 91 to 93
except Exception as e:
return HTTPException(status_code=404, detail=str(e))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Incorrect use of return instead of raise for HTTPException

In the get_variables function, the except clause uses return HTTPException(...), but to properly send an HTTP error response, you should raise the HTTPException.

Apply this diff to fix the issue:

 except Exception as e:
-    return HTTPException(status_code=404, detail=str(e))
+    raise HTTPException(status_code=404, detail=str(e)) from e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception as e:
return HTTPException(status_code=404, detail=str(e))
except Exception as e:
raise HTTPException(status_code=404, detail=str(e)) from e

Comment on lines +540 to +561
def add_group(session: Session, title: str, description: str, type: str, entry_ids: list[int] = []) -> models.EntryGroup:
types = get_grouptypes(session=session)
grouptype = next(filter(lambda t: t.name.lower() == type.lower(), types), None)
if grouptype is None:
raise ValueError(f"The type {type} is not valid. Maybe you misspelled? Supported types: [{','.join([t.title for t in types])}]")

group = models.EntryGroupTable(
title=title,
description=description,
type_id=grouptype.id
)

session.add(group)
session.commit()
session.refresh(group)

if len(entry_ids) > 0:
for entry_id in entry_ids:
session.add(models.NMGroupsEntries(entry_id=entry_id, group_id=group.id))
session.commit()

return models.EntryGroup.model_validate(group)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix mutable default argument.

Using mutable default arguments can lead to unexpected behavior when the function is called multiple times.

Apply this diff to fix the issue:

-def add_group(session: Session, title: str, description: str, type: str, entry_ids: list[int] = []) -> models.EntryGroup:
+def add_group(session: Session, title: str, description: str, type: str, entry_ids: list[int] | None = None) -> models.EntryGroup:
     types = get_grouptypes(session=session)
     grouptype = next(filter(lambda t: t.name.lower() == type.lower(), types), None)
     if grouptype is None:
         raise ValueError(f"The type {type} is not valid. Maybe you misspelled? Supported types: [{','.join([t.title for t in types])}]")
     
     group = models.EntryGroupTable(
         title=title,
         description=description,
         type_id=grouptype.id
     )

     session.add(group)
     session.commit()
     session.refresh(group)

-    if len(entry_ids) > 0:
+    if entry_ids:
         for entry_id in entry_ids:
             session.add(models.NMGroupsEntries(entry_id=entry_id, group_id=group.id))
         session.commit()

     return models.EntryGroup.model_validate(group)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def add_group(session: Session, title: str, description: str, type: str, entry_ids: list[int] = []) -> models.EntryGroup:
types = get_grouptypes(session=session)
grouptype = next(filter(lambda t: t.name.lower() == type.lower(), types), None)
if grouptype is None:
raise ValueError(f"The type {type} is not valid. Maybe you misspelled? Supported types: [{','.join([t.title for t in types])}]")
group = models.EntryGroupTable(
title=title,
description=description,
type_id=grouptype.id
)
session.add(group)
session.commit()
session.refresh(group)
if len(entry_ids) > 0:
for entry_id in entry_ids:
session.add(models.NMGroupsEntries(entry_id=entry_id, group_id=group.id))
session.commit()
return models.EntryGroup.model_validate(group)
def add_group(session: Session, title: str, description: str, type: str, entry_ids: list[int] | None = None) -> models.EntryGroup:
types = get_grouptypes(session=session)
grouptype = next(filter(lambda t: t.name.lower() == type.lower(), types), None)
if grouptype is None:
raise ValueError(f"The type {type} is not valid. Maybe you misspelled? Supported types: [{','.join([t.title for t in types])}]")
group = models.EntryGroupTable(
title=title,
description=description,
type_id=grouptype.id
)
session.add(group)
session.commit()
session.refresh(group)
if entry_ids:
for entry_id in entry_ids:
session.add(models.NMGroupsEntries(entry_id=entry_id, group_id=group.id))
session.commit()
return models.EntryGroup.model_validate(group)
🧰 Tools
🪛 Ruff (0.8.2)

540-540: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
metacatalog_api/router/api/read.py (2)

88-94: Consider using different status codes for different error scenarios.

The current implementation uses HTTP 404 for all exceptions, which might not be appropriate. Consider differentiating between:

  • 404 for "not found" scenarios
  • 400 for invalid parameters
  • 500 for internal server errors

Example implementation:

 def get_variables(only_available: bool = False, offset: int = None, limit: int = None):
     try:
         variables = core.variables(only_available=only_available, offset=offset, limit=limit)
     except ValueError as e:
-        raise HTTPException(status_code=404, detail=str(e)) from e
+        raise HTTPException(status_code=400, detail=str(e)) from e
+    except core.NotFoundError as e:
+        raise HTTPException(status_code=404, detail=str(e)) from e
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=str(e)) from e

128-128: Add type annotation for group_id parameter.

The group_id parameter is missing its type annotation.

-def get_group(group_id) -> models.EntryGroupWithMetadata:
+def get_group(group_id: int) -> models.EntryGroupWithMetadata:
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5bcf50 and 9e8631b.

📒 Files selected for processing (2)
  • metacatalog_api/router/api/create.py (1 hunks)
  • metacatalog_api/router/api/read.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • metacatalog_api/router/api/create.py
🧰 Additional context used
🪛 Ruff (0.8.2)
metacatalog_api/router/api/read.py

102-102: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🔇 Additional comments (1)
metacatalog_api/router/api/read.py (1)

6-6: LGTM!

The new import is correctly placed and necessary for type hints in the new group-related endpoints.

Comment on lines +98 to +103
def get_variable(id: int):
try:
variable = core.variables(id=id)
except Exception as e:
raise HTTPException(status_code=404, detail=str(e))
return variable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add exception chaining and improve error handling.

The error handling needs improvement:

  1. Add exception chaining with from e
  2. Differentiate between error types
 def get_variable(id: int):
     try:
         variable = core.variables(id=id)
+        if variable is None:
+            raise HTTPException(status_code=404, detail=f"Variable with id {id} not found")
+        return variable
     except Exception as e:
-        raise HTTPException(status_code=404, detail=str(e))
+        raise HTTPException(status_code=500, detail=str(e)) from e
-    return variable
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_variable(id: int):
try:
variable = core.variables(id=id)
except Exception as e:
raise HTTPException(status_code=404, detail=str(e))
return variable
def get_variable(id: int):
try:
variable = core.variables(id=id)
if variable is None:
raise HTTPException(status_code=404, detail=f"Variable with id {id} not found")
return variable
except Exception as e:
raise HTTPException(status_code=500, detail=str(e)) from e
🧰 Tools
🪛 Ruff (0.8.2)

102-102: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

Comment on lines 117 to 137
def get_groups(title: str = None, description: str = None, type: str = None, limit: int = None, offset: int = None) -> list[models.EntryGroup]:
try:
groups = core.groups(title=title, description=description, type=type, with_metadata=False, limit=limit, offset=offset)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e)) from e

return groups


@read_router.get('/groups/{group_id}')
@read_router.get('/groups/{group_id}.json')
def get_group(group_id) -> models.EntryGroupWithMetadata:
try:
group = core.groups(id=group_id, with_metadata=True)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e)) from e

if group is None:
raise HTTPException(status_code=404, detail=f"Group of id {group_id} was not found.")

return group
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Standardize error handling across group endpoints.

The error handling differs between get_groups and get_group:

  1. get_groups uses 500 for all errors
  2. get_group has an additional None check
  3. Both should handle common error cases consistently

Standardize the implementation:

 def get_groups(title: str = None, description: str = None, type: str = None, limit: int = None, offset: int = None) -> list[models.EntryGroup]:
     try:
         groups = core.groups(title=title, description=description, type=type, with_metadata=False, limit=limit, offset=offset)
+        if not groups:
+            return []
     except Exception as e:
         raise HTTPException(status_code=500, detail=str(e)) from e
     
     return groups

 def get_group(group_id) -> models.EntryGroupWithMetadata:
     try:
         group = core.groups(id=group_id, with_metadata=True)
+        if group is None:
+            raise HTTPException(status_code=404, detail=f"Group with id {group_id} not found")
+        return group
     except Exception as e:
         raise HTTPException(status_code=500, detail=str(e)) from e
-
-    if group is None:
-        raise HTTPException(status_code=404, detail=f"Group of id {group_id} was not found.")
-
-    return group
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_groups(title: str = None, description: str = None, type: str = None, limit: int = None, offset: int = None) -> list[models.EntryGroup]:
try:
groups = core.groups(title=title, description=description, type=type, with_metadata=False, limit=limit, offset=offset)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e)) from e
return groups
@read_router.get('/groups/{group_id}')
@read_router.get('/groups/{group_id}.json')
def get_group(group_id) -> models.EntryGroupWithMetadata:
try:
group = core.groups(id=group_id, with_metadata=True)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e)) from e
if group is None:
raise HTTPException(status_code=404, detail=f"Group of id {group_id} was not found.")
return group
def get_groups(title: str = None, description: str = None, type: str = None, limit: int = None, offset: int = None) -> list[models.EntryGroup]:
try:
groups = core.groups(title=title, description=description, type=type, with_metadata=False, limit=limit, offset=offset)
if not groups:
return []
except Exception as e:
raise HTTPException(status_code=500, detail=str(e)) from e
return groups
@read_router.get('/groups/{group_id}')
@read_router.get('/groups/{group_id}.json')
def get_group(group_id) -> models.EntryGroupWithMetadata:
try:
group = core.groups(id=group_id, with_metadata=True)
if group is None:
raise HTTPException(status_code=404, detail=f"Group with id {group_id} not found")
return group
except Exception as e:
raise HTTPException(status_code=500, detail=str(e)) from e

@AlexDo1
Copy link
Collaborator

AlexDo1 commented Jan 8, 2025

Very nice that you can create new entrygroups by just passing a dictionary!

I have a few suggestions for entry group types:

I think a group type "Dataset" would be helpful and descriptive. The titles for such groups could be "CAMELS-DEv1.0.0", "HYRAS", "Caravan-DE", ...
So in many cases grouping variables / Entries from the same Dataset.
I guess this would currently be a Composite, but I think I find the type "Dataset" more descriptive, especially in our field.

I could also imagine something like "MeasuringSite" or "Station" or "Site", where we could group variables measured at the same location / site. This could for example be a DWD station, a CAMELS-DE station, an Eddy Covariance tower, ...
These stations would in many cases be part of a entry group "Dataset".

This model of a dataset with entities of stations would also already fit a lot of the datasets we have on the server and also on vforwater, so maybe it is a good idea for metacatlog-api and not only for the camels metacatalog-api.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
metacatalog_api/db.py (1)

494-520: Consider adding caching for group types.

The implementation looks good, but since group types are relatively static, consider caching them to improve performance.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9e8631b and 16330f1.

📒 Files selected for processing (3)
  • metacatalog_api/core.py (5 hunks)
  • metacatalog_api/db.py (6 hunks)
  • metacatalog_api/router/api/read.py (3 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
metacatalog_api/router/api/read.py

102-102: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

metacatalog_api/db.py

542-542: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🔇 Additional comments (15)
metacatalog_api/router/api/read.py (6)

88-94: LGTM!

The function signature update and error handling improvements look good.


96-103: Add exception chaining and improve error handling.

The error handling needs improvement:

  1. Add exception chaining with from e
  2. Differentiate between error types

Apply this diff to fix the issue:

 def get_variable(id: int):
     try:
         variable = core.variables(id=id)
+        if variable is None:
+            raise HTTPException(status_code=404, detail=f"Variable with id {id} not found")
+        return variable
     except Exception as e:
-        raise HTTPException(status_code=404, detail=str(e))
+        raise HTTPException(status_code=500, detail=str(e)) from e
-    return variable
🧰 Tools
🪛 Ruff (0.8.2)

102-102: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


105-112: LGTM!

The function implementation and error handling look good.


115-123: Standardize error handling across group endpoints.

The error handling needs improvement:

  1. Add None check for empty results
  2. Use consistent error codes

Apply this diff to fix the issue:

 def get_groups(title: str = None, description: str = None, type: str = None, limit: int = None, offset: int = None) -> list[models.EntryGroup]:
     try:
         groups = core.groups(title=title, description=description, type=type, with_metadata=False, limit=limit, offset=offset)
+        if not groups:
+            return []
         return groups
     except Exception as e:
         raise HTTPException(status_code=500, detail=str(e)) from e

126-133: LGTM!

The function implementation and error handling look good.


136-147: LGTM!

The function implementation, error handling, and None check look good.

metacatalog_api/core.py (3)

106-113: LGTM!

The function implementation and session management look good.


116-118: LGTM!

The function implementation is clean and focused.


214-223: LGTM!

The function implementation, type validation, and entry association look good.

metacatalog_api/db.py (6)

Line range hint 223-248: LGTM!

The functions correctly implement name lookup with proper wildcard handling and strict matching support.


Line range hint 249-274: LGTM!

The functions correctly implement author creation with proper session management.


Line range hint 275-289: LGTM!

The function correctly retrieves both primary and co-authors with proper ordering.


523-539: LGTM!

The function correctly implements group retrieval with proper input validation.


542-563: Fix mutable default argument.

Using mutable default arguments can lead to unexpected behavior when the function is called multiple times.

Apply this diff to fix the issue:

-def add_group(session: Session, title: str, description: str, type: str, entry_ids: list[int] = []) -> models.EntryGroup:
+def add_group(session: Session, title: str, description: str, type: str, entry_ids: list[int] | None = None) -> models.EntryGroup:
     types = get_grouptypes(session=session)
     grouptype = next(filter(lambda t: t.name.lower() == type.lower(), types), None)
     if grouptype is None:
         raise ValueError(f"The type {type} is not valid. Maybe you misspelled? Supported types: [{','.join([t.title for t in types])}]")
     
     group = models.EntryGroupTable(
         title=title,
         description=description,
         type_id=grouptype.id
     )

     session.add(group)
     session.commit()
     session.refresh(group)

-    if len(entry_ids) > 0:
+    if entry_ids:
         for entry_id in entry_ids:
             session.add(models.NMGroupsEntries(entry_id=entry_id, group_id=group.id))
         session.commit()

     return models.EntryGroup.model_validate(group)
🧰 Tools
🪛 Ruff (0.8.2)

542-542: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


565-586: LGTM!

The function correctly implements entry-group association with proper handling of different input types.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
metacatalog_api/db.py (1)

Line range hint 249-265: Fix logical error in SQL condition.

The SQL condition uses the bitwise operator & instead of the logical operator AND, which will not work as intended.

Apply this diff to fix the condition:

     sql = select(models.PersonTable).where(
-        models.PersonTable.first_name == author.first_name &
-        models.PersonTable.last_name == author.last_name &
-        models.PersonTable.organisation_name == author.organisation_name &
-        models.PersonTable.organisation_abbrev == author.organisation_abbrev
+        (models.PersonTable.first_name == author.first_name) &
+        (models.PersonTable.last_name == author.last_name) &
+        (models.PersonTable.organisation_name == author.organisation_name) &
+        (models.PersonTable.organisation_abbrev == author.organisation_abbrev)
     )
🧹 Nitpick comments (5)
metacatalog_api/sql/migrate/migration_3.sql (1)

7-7: Fix typo in Site description

There's a typo in the word "sufficiently".

-INSERT INTO {schema}.entrygroup_types VALUES (6, 'Site', 'A Site groups related entries, which have been collected at the same site. This can be used if the location or spatial-scale does not suffiently group the entries together.');
+INSERT INTO {schema}.entrygroup_types VALUES (6, 'Site', 'A Site groups related entries, which have been collected at the same site. This can be used if the location or spatial-scale does not sufficiently group the entries together.');
metacatalog_api/sql/maintain/defaults.sql (2)

3435-3435: Fix typo in Site description

There's a typo in the word "sufficiently".

-INSERT INTO {schema}.entrygroup_types VALUES (6, 'Site', 'A Site groups related entries, which have been collected at the same site. This can be used if the location or spatial-scale does not suffiently group the entries together.');
+INSERT INTO {schema}.entrygroup_types VALUES (6, 'Site', 'A Site groups related entries, which have been collected at the same site. This can be used if the location or spatial-scale does not sufficiently group the entries together.');

3434-3435: Consider clarifying the relationship between group types

The distinctions between 'Label', 'Site', and spatial grouping could be clearer:

  1. When should users choose 'Site' vs. using spatial coordinates/scale?
  2. What's the difference between using 'Label' vs. 'Site' for location-based grouping?

Consider enhancing the descriptions to better guide users in choosing the appropriate type.

metacatalog_api/db.py (2)

499-520: Consider splitting the function for better type safety.

The function returns different types based on the with_metadata parameter, which can lead to type confusion and runtime errors. Consider creating separate functions for clarity:

  • get_groups
  • get_groups_with_metadata
-def get_groups(session: Session, title: str = None, description: str = None, type: str = None, entry_id: int = None, limit: int = None, offset: int = None, with_metadata: bool = False) -> list[models.EntryGroup] | list[models.EntryGroupWithMetadata]:
+def get_groups(session: Session, title: str = None, description: str = None, type: str = None, entry_id: int = None, limit: int = None, offset: int = None) -> list[models.EntryGroup]:
     # ... existing implementation ...
-    if with_metadata:
-        return [models.EntryGroupWithMetadata.model_validate(g) for g in groups]
-    else:
-        return [models.EntryGroup.model_validate(g) for g in groups]
+    return [models.EntryGroup.model_validate(g) for g in groups]

+def get_groups_with_metadata(session: Session, title: str = None, description: str = None, type: str = None, entry_id: int = None, limit: int = None, offset: int = None) -> list[models.EntryGroupWithMetadata]:
+    groups = get_groups(session, title, description, type, entry_id, limit, offset)
+    return [models.EntryGroupWithMetadata.model_validate(g) for g in groups]

494-496: Consider adding new group types based on user feedback.

As suggested in the PR comments by AlexDo1, consider adding these new group types to better describe the data:

  • "Dataset" for grouping variables from the same dataset (e.g., CAMELS-DEv1.0.0, HYRAS)
  • "MeasuringSite"/"Station"/"Site" for grouping variables from the same location

Would you like me to help create a migration script to add these new group types?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 16330f1 and fdeed5b.

📒 Files selected for processing (4)
  • metacatalog_api/db.py (7 hunks)
  • metacatalog_api/sql/maintain/defaults.sql (1 hunks)
  • metacatalog_api/sql/maintain/install.sql (1 hunks)
  • metacatalog_api/sql/migrate/migration_3.sql (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
metacatalog_api/db.py

542-542: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🔇 Additional comments (8)
metacatalog_api/sql/migrate/migration_3.sql (2)

6-6: LGTM! Dataset group type aligns with user requirements

The Dataset group type effectively addresses the need to group related but self-contained entries, as requested in the PR feedback.


2-3: Verify existing data before migration

The ALTER TABLE statements add NOT NULL constraints. Ensure there are no NULL values in existing records before applying the migration to prevent failures.

metacatalog_api/sql/maintain/install.sql (1)

412-413: LGTM! Schema changes are consistent

The column type changes in the installation script match the migration script, ensuring consistency between fresh installations and upgrades.

metacatalog_api/sql/maintain/defaults.sql (2)

3434-3434: LGTM! Dataset type addition aligns with requirements.

The new 'Dataset' type effectively addresses the use case mentioned in PR comments for grouping variables from datasets like "CAMELS-DEv1.0.0," "HYRAS," and "Caravan-DE". The description clearly distinguishes it from composites.


3434-3435: Verify database migration impact

Since these are new group types being added alongside a planned migration for changing column types to TEXT, we should ensure the migration handles existing data properly.

✅ Verification successful

Migration implementation is safe and well-structured

The migration correctly handles the column type changes before adding new group types, and the foreign key constraints are properly configured to prevent cascading effects. The VARCHAR to TEXT conversion is safe and won't affect existing data.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if there are any existing groups that might be affected
rg -A 5 "ALTER TABLE.*entrygroup_types" 

# Look for any constraints or foreign keys that need consideration
rg -A 10 "REFERENCES.*entrygroup_types"

Length of output: 1749

metacatalog_api/db.py (3)

Line range hint 239-249: LGTM! Well-implemented name search with strict mode.

The function handles wildcards appropriately and provides clear behavior control through the strict parameter.


542-563: Fix mutable default argument.

Using mutable default arguments can lead to unexpected behavior when the function is called multiple times.

🧰 Tools
🪛 Ruff (0.8.2)

542-542: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


13-13: Verify database migration plan for version 3.

The DB_VERSION increment suggests schema changes. As mentioned in the PR objectives about changing the data type of title and description fields from character varying to text, ensure that:

  1. Migration scripts are provided
  2. Rollback procedures are documented
  3. Data preservation during migration is addressed
✅ Verification successful

Migration plan for version 3 is properly implemented

The migration script (migration_3.sql) correctly implements:

  • Safe column type changes from varchar to TEXT
  • New entrygroup types for enhanced group management
  • Implicit data preservation (TEXT type is compatible with existing varchar data)

The changes are reversible through standard ALTER TABLE statements if rollback is needed.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for migration scripts
echo "Searching for migration scripts..."
fd -e sql -e py "migration|upgrade|rollback" "sql/maintain"

Length of output: 238


Script:

#!/bin/bash
# Find all SQL files
echo "=== SQL Files ==="
fd -e sql

# Find Python files with migration-related content
echo -e "\n=== Migration-related Python Files ==="
rg -l "(?i)(migrate|upgrade|downgrade|rollback|alembic|version)" -g "*.py"

# Look for common migration tool configs
echo -e "\n=== Migration Configs ==="
fd "alembic.ini|migrations.json|db_version"

Length of output: 909


Script:

#!/bin/bash
# Display content of migration_3.sql
echo "=== Content of migration_3.sql ==="
cat metacatalog_api/sql/migrate/migration_3.sql

Length of output: 846

Comment on lines +565 to +586
def group_entries(session: Session, group: models.EntryGroupBase | int, entry_ids: list[int | models.EntryBase]) -> models.EntryGroup:
# get the group
if isinstance(group, int):
group_id = group
elif hasattr(group, 'id') and group.id is not None:
group_id = group.id
else:
new_group = get_group(session=session, title=group.title)
if new_group is None:
group = add_group(session=session, title=group.title, description=group.description, type=group.type)
else:
group = new_group
group_id = group.id

for entry_id in entry_ids:
if not isinstance(entry_id, int):
entry_id = entry_id.id
session.add(models.NMGroupsEntries(group_id=group_id, entry_id=entry_id))
session.commit()

group = get_group(session=session, id=group_id)
return models.EntryGroup.model_validate(group)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add input validation for entry_ids.

The function should validate that the entry_ids exist in the database before creating the associations to prevent orphaned relationships.

Add this validation before the loop:

     for entry_id in entry_ids:
         if not isinstance(entry_id, int):
             entry_id = entry_id.id 
+        # Verify entry exists
+        entry = session.get(models.EntryTable, entry_id)
+        if entry is None:
+            raise ValueError(f"Entry with id {entry_id} not found")
         session.add(models.NMGroupsEntries(group_id=group_id, entry_id=entry_id))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def group_entries(session: Session, group: models.EntryGroupBase | int, entry_ids: list[int | models.EntryBase]) -> models.EntryGroup:
# get the group
if isinstance(group, int):
group_id = group
elif hasattr(group, 'id') and group.id is not None:
group_id = group.id
else:
new_group = get_group(session=session, title=group.title)
if new_group is None:
group = add_group(session=session, title=group.title, description=group.description, type=group.type)
else:
group = new_group
group_id = group.id
for entry_id in entry_ids:
if not isinstance(entry_id, int):
entry_id = entry_id.id
session.add(models.NMGroupsEntries(group_id=group_id, entry_id=entry_id))
session.commit()
group = get_group(session=session, id=group_id)
return models.EntryGroup.model_validate(group)
def group_entries(session: Session, group: models.EntryGroupBase | int, entry_ids: list[int | models.EntryBase]) -> models.EntryGroup:
# get the group
if isinstance(group, int):
group_id = group
elif hasattr(group, 'id') and group.id is not None:
group_id = group.id
else:
new_group = get_group(session=session, title=group.title)
if new_group is None:
group = add_group(session=session, title=group.title, description=group.description, type=group.type)
else:
group = new_group
group_id = group.id
for entry_id in entry_ids:
if not isinstance(entry_id, int):
entry_id = entry_id.id
# Verify entry exists
entry = session.get(models.EntryTable, entry_id)
if entry is None:
raise ValueError(f"Entry with id {entry_id} not found")
session.add(models.NMGroupsEntries(group_id=group_id, entry_id=entry_id))
session.commit()
group = get_group(session=session, id=group_id)
return models.EntryGroup.model_validate(group)

@mmaelicke mmaelicke merged commit fdeed5b into main Jan 8, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants