Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved monitoring of leased connections #15

Merged
merged 8 commits into from
Nov 30, 2024

Conversation

renatomassaro
Copy link
Owner

@renatomassaro renatomassaro commented Nov 24, 2024

Any process can fetch a connection. When they do, this connection is locked to this process, which is expected to release it once he's done. What happens if the process dies in the meantime? If it crashes? If it forgets to release?

That's where this PR comes in. If a process terminates without releasing the connection, we'll get notified and we'll clean up their mess.

If the caller process terminated normally (without an exception) and yet the connection was not released, emit a warning: it is likely the application forgot to commit or rollback their transaction.

Finally, in another commit I should handle the scenario where a caller process goes on to live forever with a locked connection. That might be okay, but only if explicitly requested by the caller process. I'm thinking timeout intervals with lease renewals.

EDIT: The above is now handled, except for "lease renewals". That's not needed now and it may never be needed. AFAIK not even Ecto has a similar feature. If the caller needs a big timeout, just set it from the start.

Summary by CodeRabbit

  • New Features

    • Introduced a timeout mechanism for connection requests to enhance connection management.
    • Added a new function to notify the repository upon release, improving transaction handling.
    • Introduced a new module for managing repository configuration, including a struct for required fields.
  • Bug Fixes

    • Enhanced error handling for connection management and transaction rollbacks.
  • Tests

    • Expanded tests for connection management to cover new timeout scenarios and state validations after connection releases.
    • Updated assertions in existing tests to validate additional connection state fields.
    • Introduced a new function to block processes indefinitely for more reliable testing.
  • Documentation

    • Added documentation for the new block_forever function to clarify its purpose in tests.

Any process can fetch a connection. When they do, this connection is
locked to this process, which is expected to release it once he's done.
What happens if the process dies in the meantime? If it crashes? If it
forgets to release?

That's where this monitoring comes in. If a process terminates without
releasing the connection, we'll get notified and we'll clean up their
mess.

If the caller process terminated normally (without an exception) and yet
the connection was not released, emit a warning: it is likely the
application forgot to commit or rollback their transaction.
@renatomassaro renatomassaro force-pushed the improved-monitoring-of-leased-connections branch from a587ec4 to 528af16 Compare November 24, 2024 15:10
Now we automatically enqueue the requests; there is no polling
Now a request / caller / leasee can no longer hold the connection open
indefinitely -- unless explicitly stated.
@renatomassaro renatomassaro force-pushed the improved-monitoring-of-leased-connections branch from cdbff15 to f3c6f7c Compare November 27, 2024 10:27
This allows the Repo to perform internal clean-up in order to be ready
to serve new requests.
@renatomassaro renatomassaro force-pushed the improved-monitoring-of-leased-connections branch from f3c6f7c to 94733e4 Compare November 27, 2024 10:34
The implicit "if manager absent then it is nil" logic was causing bugs
and it's always better to be explicit
Copy link

coderabbitai bot commented Nov 28, 2024

Walkthrough

The changes in this pull request enhance the functionality and error handling of the Feeb.DB.Repo and Feeb.DB.Repo.Manager modules. Key modifications include the addition of a notify_release/1 function for managing release notifications, updates to the initialization functions to accommodate a new manager_pid argument, and improvements to connection management logic with timeout mechanisms. Tests have been expanded to validate these changes, ensuring robustness in connection handling and transaction management across various scenarios.

Changes

File Change Summary
lib/feeb/db/repo.ex - Added method: notify_release/1
- Updated method signatures for start_link/1 and init/1 to include manager_pid
- Enhanced handle_call/3 to manage :mgt_connection_released messages and rollback transactions if active.
lib/feeb/db/repo/manager.ex - Added @default_repo_timeout constant
- Updated fetch_or_create_connection to accept caller and opts
- Enhanced handle_info to manage timeouts and dead processes.
- Updated connection management functions to track caller processes and timeouts.
test/db/repo/manager/registry_test.exs - Expanded assertions in fetch_or_create/2 test to include new fields in manager state.
test/db/repo/manager_test.exs - Added tests for connection release and timeout scenarios, ensuring robustness in connection management.
test/db/repo_test.exs - Updated initialization of Repo to include nil in test cases.
- Added tests for handling connection release scenarios.
test/support/db.ex - Updated ensure_migrated/2 to include nil in Repo.start_link/1 call.
test/support/db/prop.ex - Added alias for Feeb.DB for concise references in the code.
test/support/utils.ex - Added new function block_forever to keep spawned processes active during tests.
test/db/schema_test.exs - Updated assertions for repo_config to check against %DB.Repo.RepoConfig{} struct instead of a plain map.
test/support/db/schemas/friend.ex - Updated get_repo_config/2 to require %DB.Repo.RepoConfig{} struct as second argument.
lib/feeb/db/repo/manager/repo_entry.ex - Introduced new module RepoEntry for managing repository entries with a focus on their busy state.
lib/feeb/db/repo/repo_config.ex - Introduced new module RepoConfig to manage repository configuration with required fields.

Poem

In the burrow deep, where the code does flow,
A rabbit hops in, with changes to show.
With connections that timeout, and releases that sing,
The Repo now dances, with each little thing.
So let’s celebrate, with a hop and a cheer,
For the code is now better, let’s all give a cheer! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Outside diff range and nitpick comments (14)
test/support/db.ex (1)

Line range hint 52-71: Consider updating documentation to reflect new connection monitoring

The existing documentation thoroughly explains the current behavior regarding locked connections and test isolation. Given this PR's improvements to connection monitoring, consider adding a note about how the new connection monitoring features might affect or benefit the test infrastructure.

test/db/repo/manager/registry_test.exs (1)

21-43: Reduce duplication in connection state assertions

The assertions for write_1, read_1, and read_2 are identical. Consider extracting the expected state to a helper function or setup block to make the tests more maintainable.

Example approach:

+ defp expected_connection_state do
+   %{
+     pid: nil,
+     busy?: false,
+     caller_pid: nil,
+     monitor_ref: nil,
+     timer_ref: nil
+   }
+ end

  test "creates a previously non-existent Manager", %{shard_id: shard_id} do
    # ...
-   assert manager_state.write_1 == %{
-     pid: nil,
-     busy?: false,
-     caller_pid: nil,
-     monitor_ref: nil,
-     timer_ref: nil
-   }
+   assert manager_state.write_1 == expected_connection_state()
-   assert manager_state.read_1 == %{
-     pid: nil,
-     busy?: false,
-     caller_pid: nil,
-     monitor_ref: nil,
-     timer_ref: nil
-   }
+   assert manager_state.read_1 == expected_connection_state()
-   assert manager_state.read_2 == %{
-     pid: nil,
-     busy?: false,
-     caller_pid: nil,
-     monitor_ref: nil,
-     timer_ref: nil
-   }
+   assert manager_state.read_2 == expected_connection_state()
test/db/repo_test.exs (2)

12-12: LGTM! Consider adding documentation for the new parameter.

The addition of manager_pid parameter (set to nil) in the initialization tuples is consistent with the PR objectives. However, it would be helpful to document why this parameter is typically set to nil in tests.

Consider adding a comment explaining that nil indicates no manager validation is needed for these test cases.

Also applies to: 26-26, 51-51, 77-77


96-99: LGTM! Good addition of SQLite-level transaction verification.

The new assertions effectively verify the actual transaction state at the SQLite level, which strengthens the tests.

There's a typo in the comment on line 109: "colurse" should be "course".

Also applies to: 109-112

test/db/repo/manager_test.exs (4)

333-334: Ensure proper cleanup after transaction begins

In the test where a transaction is begun but not completed, ensure that the transaction is appropriately rolled back or cleaned up after the test to prevent side effects on other tests.


274-277: Consider asserting on monitor reference for completeness

After fetching the connection and before killing request_pid, you might want to assert that a monitor reference exists to confirm that the process is being monitored.

assert state_before.write_1.monitor_ref

307-310: Ensure the queue length reflects the number of waiting processes

In the assertion assert :queue.len(state_before.write_queue) == 1, confirm that the queue length accurately reflects the number of processes waiting for a connection. This adds clarity to the test and ensures that the state is as expected.


330-332: Handle potential timing issues in timeout test

The timeout value of 25 milliseconds may cause flakiness in test environments with varying performance. Consider increasing the timeout slightly or using a more robust method to trigger the timeout condition reliably.

lib/feeb/db/repo.ex (3)

20-21: Use named parameters in start_link/1 for improved readability

The function start_link/1 currently matches a tuple with anonymous elements {_, _, _, _, _}, which can reduce code clarity and maintainability. Using named variables enhances readability and makes the code easier to understand.

Consider refactoring the function signature as follows:

-def start_link({_, _, _, _, _} = args),
+def start_link({context, shard_id, path, mode, manager_pid} = args),

30-31: Enhance documentation for notify_release/1

The docstring for notify_release/1 can be improved for clarity and completeness. Providing more precise documentation helps other developers understand the purpose and usage of the function.

Consider updating the documentation:

 @doc """
-  Used by the Repo.Manager to notify once the Repo has been released. Useful to resetting internal
-  counters, transaction_id etc.
+  Notifies the `Repo` that it has been released by the `Repo.Manager`.
+  Resets internal counters and the `transaction_id`.
  """
 def notify_release(pid),

396-405: Simplify assert_release_signal_from_manager!/2 function clauses

The multiple clauses of assert_release_signal_from_manager!/2 can be consolidated for better readability and maintainability.

Combine the clauses and handle the test environment condition within the function:

-defp assert_release_signal_from_manager!(manager_pid, manager_pid), do: :ok

-if @env == :test do
-  defp assert_release_signal_from_manager!(nil, _caller_pid), do: :ok
-end

-defp assert_release_signal_from_manager!(manager_pid, other_pid) do
-  "Repo can only be released by its Manager (#{inspect(manager_pid)}, got #{inspect(other_pid)})"
-  |> raise()
-end
+defp assert_release_signal_from_manager!(expected_manager_pid, caller_pid) do
+  cond do
+    expected_manager_pid == caller_pid ->
+      :ok
+    @env == :test and is_nil(expected_manager_pid) ->
+      :ok
+    true ->
+      raise "Repo can only be released by its Manager (#{inspect(expected_manager_pid)}, got #{inspect(caller_pid)})"
+  end
+end
lib/feeb/db/repo/manager.ex (3)

81-87: Consider using a struct for connection entries to improve maintainability

The current implementation initializes each connection entry with similar keys. Converting each connection entry to a struct would enhance code readability, reduce duplication, and simplify future modifications.


288-288: Resolve the review comment regarding linking GenServers

There's a comment questioning whether to link both GenServers:

# REVIEW: Do I really want to link both genservers?

Please decide whether it's appropriate to link the Repo process to the Manager process and update the code accordingly. Unresolved comments can be confusing and should be addressed before merging.


360-360: Adjust log level for dead caller information

In the process_enqueued_callers/2 function, when a caller has died while waiting for a connection, an info-level log is generated:

Logger.info("Caller #{inspect(caller_pid)} waiting for a connection has died; skipping it")

Consider changing the log level to :debug, as this may be a normal occurrence and might not require attention at the :info level.

Apply this diff to adjust the log level:

- Logger.info("Caller #{inspect(caller_pid)} waiting for a connection has died; skipping it")
+ Logger.debug("Caller #{inspect(caller_pid)} waiting for a connection has died; skipping it")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 0e9f398 and 72d82b8.

📒 Files selected for processing (8)
  • lib/feeb/db/repo.ex (4 hunks)
  • lib/feeb/db/repo/manager.ex (10 hunks)
  • test/db/repo/manager/registry_test.exs (1 hunks)
  • test/db/repo/manager_test.exs (3 hunks)
  • test/db/repo_test.exs (7 hunks)
  • test/support/db.ex (1 hunks)
  • test/support/db/prop.ex (2 hunks)
  • test/support/utils.ex (1 hunks)
🔇 Additional comments (11)
test/support/utils.ex (2)

27-30: Documentation clearly explains the purpose and usage

The documentation effectively communicates the function's purpose and its role in preventing test flakiness.


31-35: Verify cleanup in test teardown

While the implementation is correct for keeping processes alive during tests, ensure that any processes using block_forever/0 are properly terminated in test teardown to prevent resource leaks.

Consider adding a helper function for safer process management in tests:

def spawn_blocking_process(fun) do
  pid = spawn_and_wait(fun)
  ExUnit.Callbacks.on_exit(fn ->
    Process.exit(pid, :kill)
  end)
  pid
end

This would ensure automatic cleanup while maintaining the same blocking behavior.

✅ Verification successful

No cleanup needed - test process management is appropriate

The block_forever usage is actually part of a test that deliberately spawns a process to verify error handling behavior. The process is intentionally spawned to trigger a specific error condition, and the test properly:

  1. Uses Process.flag(:trap_exit, true) to handle the process termination
  2. Verifies the error condition with assert_receive {:EXIT, ^repo, ...}
  3. The spawned process is automatically cleaned up when the repo process exits

The process blocking is an intentional part of the test's error handling verification, and the test properly manages and verifies process termination. No additional cleanup is necessary in this case.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if there are tests using block_forever without proper cleanup

# Look for test files using block_forever
echo "Checking for tests using block_forever..."
rg -l "block_forever" test/**/*_test.exs | while read -r file; do
  echo "Analyzing $file..."
  # Check if the file has proper cleanup mechanisms
  rg -A 10 "block_forever" "$file"
  # Look for cleanup patterns in the same file
  echo "Checking for cleanup patterns..."
  rg "on_exit|ExUnit.Callbacks|teardown" "$file"
done

Length of output: 986


Script:

#!/bin/bash
# Let's check for process management patterns around block_forever usage

# First, let's get more context around the block_forever usage
echo "Checking broader context of block_forever usage in repo_test.exs..."
rg -B 10 -A 10 "block_forever" test/db/repo_test.exs

# Let's also check for process management related setup/teardown in the test file
echo "\nChecking test setup and process management..."
rg "setup|describe|test" test/db/repo_test.exs

# Check if there are any process monitoring or cleanup patterns
echo "\nChecking process management patterns..."
rg "Process\.(monitor|exit|flag)|spawn(_link)?" test/db/repo_test.exs

Length of output: 2854

test/support/db/prop.ex (2)

3-3: LGTM: Clean alias addition

The new alias improves code readability by allowing direct references to DB functions.


25-29: Verify connection cleanup in test setup

Given the PR's focus on connection leasing and monitoring, we should verify that test connections are properly managed. The immediate commit after begin is fine, but let's ensure we're not leaving any connections in a leased state during test setup.

✅ Verification successful

Let me analyze the results and generate additional verification scripts to check for proper connection cleanup patterns.


Let me try one more verification to specifically look at the test support modules and their connection handling patterns.


Based on the verification results, I can now provide a final response:

Connection handling is properly managed in test setup

The test database setup in test/support/db/prop.ex correctly manages connections:

  • The DB.begin/commit sequence is properly paired
  • The SQLite connection is explicitly closed with SQLite.close(db_conn)
  • The connection is only used for initialization and not kept open
  • The special test transaction ID (-99) is used consistently
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for proper connection handling in test setup
# Look for connection cleanup patterns in test files

# Check for connection cleanup patterns in test files
echo "Checking for connection cleanup patterns in tests..."
rg -A 5 "DB\.(begin|commit|rollback)" "test/"

# Look for potential connection leaks in test setup
echo "Checking for potential connection leaks in test setup..."
ast-grep --pattern 'def setup do
  $$$
  DB.begin($$$)
  $$$
end'

Length of output: 31795


Script:

#!/bin/bash
# Look for connection cleanup patterns in test setup/teardown

# Check for cleanup in setup/teardown blocks
echo "Checking setup/teardown blocks for connection cleanup..."
rg -A 10 "setup do|setup_all do" "test/"

# Check for DB.commit/rollback in test helper modules
echo "Checking test helpers for connection cleanup..."
rg -A 5 "def (setup|teardown|after_setup|before_setup)" "test/support/"

# Look for on_exit cleanup patterns
echo "Checking for on_exit cleanup..."
rg "on_exit" "test/"

Length of output: 1226


Script:

#!/bin/bash
# Check test support modules for connection handling patterns

# Look at the specific test support file content
echo "Checking test support files..."
cat test/support/db/prop.ex

# Look for connection cleanup patterns in test support
echo "Checking for connection cleanup in test support..."
rg -B 5 -A 5 "DB\.(commit|rollback)" "test/support/"

# Check for any connection cleanup helpers
echo "Checking for cleanup helpers..."
rg -l "def cleanup|def reset|def clear" "test/support/"

Length of output: 2384

test/support/db.ex (2)

28-33: LGTM! Good connection management pattern

The function follows a robust pattern by explicitly closing the connection after migration, which aligns well with the PR's focus on proper connection management. The synchronous close via GenServer.call ensures the migration completes before proceeding.


30-30: Verify consistent usage of manager_pid parameter across test files

The addition of nil as the manager_pid parameter aligns with the PR's connection monitoring improvements. However, we should verify that this pattern is consistently applied across all test files that initialize the Repo.

✅ Verification successful

Consistent usage of manager_pid parameter confirmed

The verification shows that manager_pid parameter is consistently used across test files:

  • In test/support/db.ex: Uses nil for migration setup
  • In test/db/repo_test.exs: Uses self() for testing connection management

The different values (nil vs self()) are intentional and appropriate for their contexts:

  • Migration setup doesn't need connection management, hence nil
  • Repository tests verify connection management, hence self()
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for other Repo.start_link calls in test files to ensure consistent parameter usage
# Expected: All calls should include the manager_pid parameter

# Search for Repo.start_link calls in test files
rg "Repo\.start_link" test/ -A 2

Length of output: 490

test/db/repo_test.exs (1)

204-254: LGTM! Excellent test coverage for connection release scenarios.

The new test block comprehensively covers the connection release functionality:

  • Regular lifecycle verification
  • Automatic transaction rollback for abandoned connections
  • Manager validation for release signals

This aligns perfectly with the PR objectives of improving connection lease monitoring.

test/db/repo/manager_test.exs (2)

364-364: Check the initial value of timer_ref for consistency

When setting timer_ref to :no_timer, ensure that this value is consistently used throughout the codebase to represent the absence of a timer. Verify that there are no cases where nil or another sentinel value is used instead.


342-342: 🛠️ Refactor suggestion

Clarify the source of the {:EXIT, manager, :feebdb_repo_timeout} message

In the test, the process is trapping exits and expects to receive an {:EXIT, manager, :feebdb_repo_timeout} message. However, the manager process does not appear to be linked to the test process, so this message may not be received as expected. Ensure that the test process is properly linked to the manager or adjust the test to assert on the correct message.

Verify that the manager process is linked:

# Add this before fetching the connection
Process.link(manager)

Alternatively, if the intention is to receive a message when the leasee process exits due to timeout, consider monitoring the leasee process instead.

lib/feeb/db/repo.ex (1)

35-35: Ensure consistent parameter naming between start_link/1 and init/1

After refactoring start_link/1 to use named parameters, ensure that init/1 also uses the same parameter names for consistency and clarity.

lib/feeb/db/repo/manager.ex (1)

387-395: Verify handling of :infinity values in repository timeouts

The start_repo_timeout_timer/3 function checks if repo_timeout is an integer before setting up a timeout:

if is_integer(repo_timeout) do
  Process.send_after(self(), {:repo_timeout, key, caller_pid}, repo_timeout)
else
  :no_timer
end

Ensure that passing :infinity as a timeout value behaves as intended, and that the absence of a timer does not cause any unintended issues.

Run the following script to check for any usage of :infinity in repo_timeout and verify that it's handled correctly:

✅ Verification successful

Let me gather more information about the usage of timeouts in the codebase.


:infinity timeout handling is correctly implemented

The codebase shows proper handling of :infinity timeout values:

  • The test "allows for infinite timeout" explicitly verifies that timeout: :infinity works as expected and confirms no timer is created (assert manager_state.write_1.timer_ref == :no_timer)
  • The implementation in start_repo_timeout_timer/3 correctly handles non-integer values (including :infinity) by returning :no_timer
  • The behavior is documented, as seen in the codebase comments indicating that :infinity is a valid timeout value
  • The functionality is actively used in tests, particularly in stress tests where timeout: :infinity is needed for long-running operations
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all calls to `fetch_connection` that may pass `:infinity` as `:timeout` option.

# Search for `fetch_connection` calls with `:timeout` option.
rg 'fetch_connection\(.+,\s*.+,\s*\[.*:timeout\s*=>\s*:infinity.*\]\)' -A 5

# Verify that `start_repo_timeout_timer/3` properly handles `:infinity` values.
rg 'defp\s+start_repo_timeout_timer.*' -A 10

Length of output: 762


Script:

#!/bin/bash
# Search for any usage of :infinity in relation to timeouts
rg ':infinity.*timeout|timeout.*:infinity' -B 2 -A 2

# Search for @default_repo_timeout definition
rg '@default_repo_timeout.*' -B 2 -A 2

# Search for any tests related to repo timeouts
rg 'test.*repo.*timeout|test.*timeout.*repo' -B 2 -A 2

# Look for any documentation or specs related to timeout handling
ast-grep --pattern 'defp start_repo_timeout_timer($$$) :: $$$'

Length of output: 3729

assert manager_state.read_1 == %{pid: nil, busy?: false}
assert manager_state.read_2 == %{pid: nil, busy?: false}

# TODO: Assert against initial struct instead
Copy link

@coderabbitai coderabbitai bot Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Address the TODO by using struct-based assertions

Instead of manual map comparisons, consider using the actual struct type for more maintainable assertions.

Example approach:

- # TODO: Assert against initial struct instead
- assert manager_state.write_1 == %{
-   pid: nil,
-   busy?: false,
-   caller_pid: nil,
-   monitor_ref: nil,
-   timer_ref: nil
- }
+ initial_state = %Feeb.DB.Repo.Manager.ConnectionState{
+   pid: nil,
+   busy?: false,
+   caller_pid: nil,
+   monitor_ref: nil,
+   timer_ref: nil
+ }
+ assert manager_state.write_1 == initial_state

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as the comment suggests this is something I intend on doing, but now now... Fine, I'll do it now.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@renatomassaro, thank you for making the change!


If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

# It blows up if called from elsewhere
spawn(fn ->
GS.call(repo, {:mgt_connection_released})
block_forever()
Copy link

@coderabbitai coderabbitai bot Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix undefined function reference.

The block_forever() function is called but not defined in the test file.

Consider adding the function definition or using an alternative approach like:

-block_forever()
+Process.sleep(:infinity)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
block_forever()
Process.sleep(:infinity)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's defined elsewhere

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@renatomassaro, thank you for the clarification. Since block_forever/0 is defined elsewhere in the codebase, no changes are needed.


✏️ Learnings added
Learnt from: renatomassaro
PR: renatomassaro/FeebDB#15
File: test/db/repo_test.exs:247-247
Timestamp: 2024-11-28T09:52:22.579Z
Learning: The function `block_forever/0` is defined in `Test.Utils` and is used in tests to keep a process active indefinitely.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment on lines +282 to +283
# Give ample time for the `handle_info({:DOWN, ...})` callback to run
:timer.sleep(10)
Copy link

@coderabbitai coderabbitai bot Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use deterministic synchronization instead of :timer.sleep/1

Using :timer.sleep/10 to wait for asynchronous operations can lead to flaky tests, especially in environments with variable performance like CI servers. Consider using message passing or monitoring to ensure that the asynchronous operations have completed.

For example, you can monitor the process and wait for the :DOWN message:

# Replace this:
Process.exit(request_pid, :kill)
:timer.sleep(10)

# With this:
ref = Process.monitor(request_pid)
Process.exit(request_pid, :kill)
assert_receive {:DOWN, ^ref, :process, ^request_pid, _reason}

This ensures that the test proceeds only after request_pid has been terminated and the handle_info/2 callback has been invoked.

Also applies to: 312-313, 340-341, 315-316

Copy link
Owner Author

@renatomassaro renatomassaro Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but I don't see a way to sync deterministically without having a hack that affects the test readability.

For example, I could instead do a GenServer.call to the Repo.Manager: singe a GenServer is sequential, the call will only return after handle_info has executed -- but it would still have the race condition of the call arriving before the :DOWN signal, so we are back to square one.

Your suggestion of trapping exits doesn't apply -- or maybe it does, but not in the way you suggested: I can wait for the :DOWN message in the spawned pid and, once it arrives, I send a message to the test pid so it can proceed. That would work, but IMO it's not worth the readability downsides. I'll do that only if this test proves to flake. So far, it hasn't.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@renatomassaro Understood, thank you for the clarification. If the test starts to flake in the future, revisiting synchronization methods could be helpful. Let me know if you need any assistance later on.

Comment on lines +142 to +153
def handle_call({:mgt_connection_released}, {caller_pid, _}, state) do
# Make sure only the Repo.Manager can send mgt signals to the Repo.
assert_release_signal_from_manager!(state.manager_pid, caller_pid)

if not is_nil(state.transaction_id) do
Logger.info("Connection released forcibly; rolling back transaction #{state.transaction_id}")
:ok = SQLite.exec(state.conn, "ROLLBACK")
end

# Reset the GenServer state so we are ready to serve a new request
{:reply, :ok, %{state | transaction_id: nil}}
end
Copy link

@coderabbitai coderabbitai bot Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle potential race conditions when rolling back transactions on release

In handle_call/3 for {:mgt_connection_released}, there may be a race condition if a transaction is committed or rolled back just before the check for state.transaction_id. This could lead to an incorrect rollback or state inconsistency.

Consider synchronizing the access to state.transaction_id or using a more robust mechanism to manage transaction state.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there may be a race condition if a transaction is committed or rolled back just before the check for state.transaction_id

That's not true. If we are handling the :mgt_connection_released call, then the GenServer is unable to process any other messages at the same time.

If it processes a :commit message immediately after, it will fail because you can't COMMIT with transaction_id=nil.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@renatomassaro, thank you for the clarification. You're correct that the GenServer processes messages sequentially, so there's no race condition in this scenario. I appreciate the explanation, and I'll keep this in mind for future reviews.


✏️ Learnings added
Learnt from: renatomassaro
PR: renatomassaro/FeebDB#15
File: lib/feeb/db/repo.ex:142-153
Timestamp: 2024-11-28T09:59:48.393Z
Learning: In the `Feeb.DB.Repo` module, the GenServer processes messages sequentially, so there is no race condition within `handle_call/3` functions, and `state` mutations are safe from concurrent modifications.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment on lines +144 to +145
assert_release_signal_from_manager!(state.manager_pid, caller_pid)

Copy link

@coderabbitai coderabbitai bot Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Clarify parameter names in assert_release_signal_from_manager!/2

The function assert_release_signal_from_manager!/2 uses the same name for both parameters in pattern matching, which can be confusing and potentially error-prone.

Refactor the function to use distinct parameter names:

-defp assert_release_signal_from_manager!(manager_pid, manager_pid), do: :ok
+defp assert_release_signal_from_manager!(expected_manager_pid, caller_pid) when expected_manager_pid == caller_pid, do: :ok
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
assert_release_signal_from_manager!(state.manager_pid, caller_pid)
assert_release_signal_from_manager!(state.manager_pid, caller_pid)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be confusing to someone who is new to Erlang/Elixir; otherwise this is considered idiomatic pattern matching.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@renatomassaro, understood. Thank you for the clarification. I'll keep in mind that using the same parameter name in pattern matching is considered idiomatic in Erlang/Elixir.


✏️ Learnings added
Learnt from: renatomassaro
PR: renatomassaro/FeebDB#15
File: lib/feeb/db/repo.ex:144-145
Timestamp: 2024-11-28T10:01:57.008Z
Learning: In `lib/feeb/db/repo.ex`, when working with Erlang/Elixir code, it's idiomatic to use the same parameter name in function pattern matching to assert that parameters are equal. For example, defining `defp assert_release_signal_from_manager!(manager_pid, manager_pid), do: :ok` is acceptable and preferred.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment on lines 137 to 154
def handle_info({:repo_timeout, key, caller_pid}, state) do
case state[key] do
%{pid: repo_pid, busy?: true, caller_pid: ^caller_pid} ->
# Kill the caller
Process.exit(caller_pid, :feebdb_repo_timeout)

# Release the connection. It is supposed to always succeed
{:ok, ^key, new_state} = do_release_connection(state, repo_pid)
notify_enqueued_requests(new_state, key)

{:noreply, new_state}

%{caller_pid: _other_caller_or_nil} ->
# Despite the timeout, the connection is no longer used (or used by a different caller)
# Theoretically, this is a possible race condition. In this case, we just don't do anything
{:noreply, state}
end
end
Copy link

@coderabbitai coderabbitai bot Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure safe termination of caller processes on repository timeout

The handle_info/2 callback forcibly terminates the caller process using Process.exit/2 when a repository timeout occurs. While this ensures that long-running processes are halted, it may lead to unintended side effects or affect process supervision trees.

Consider whether abruptly killing the caller process is the best approach. Alternatively, you might send a timeout message to the process or implement a graceful shutdown strategy.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's exactly what we want. The timeout is a contract, and if the process exceeded it it should die.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@renatomassaro, understood. I'll note that for future reviews.


✏️ Learnings added
Learnt from: renatomassaro
PR: renatomassaro/FeebDB#15
File: lib/feeb/db/repo/manager.ex:137-154
Timestamp: 2024-11-28T10:03:22.504Z
Learning: In the `Feeb.DB.Repo.Manager` module (`lib/feeb/db/repo/manager.ex`), it's acceptable to terminate caller processes using `Process.exit(caller_pid, :feebdb_repo_timeout)` in the `handle_info/2` callback when a repository timeout occurs, as the timeout is a contract and the process should die if it exceeds it.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

# connection is properly released
{key, %{pid: repo_pid, busy?: true, caller_pid: ^caller_pid}} ->
# Sanity check: by definition, the caller process is dead
false = Process.alive?(caller_pid)
Copy link

@coderabbitai coderabbitai bot Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid assertive pattern matching when verifying process liveness

Using false = Process.alive?(caller_pid) will raise a MatchError if the process is unexpectedly alive. While the process is expected to be dead at this point, race conditions may occur.

Instead, consider using an if-statement to log a warning or take appropriate action if the process is still alive.

Apply this diff to handle this case more gracefully:

- false = Process.alive?(caller_pid)
+ if Process.alive?(caller_pid) do
+   Logger.warning("Caller process #{inspect(caller_pid)} is still alive during DOWN handler")
+   # Additional handling if necessary
+ end

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

race conditions may occur

I don't think a race condition is possible in this case, but I'm not 100% sure, which is why I want to have a "blow everything up" protection in case it does happen.

If after some time using this code I realize this is impossible, then I'll just remove this assertion. Furthermore, since we do a clean-up at the Repo itself, even if the caller is still alive and somehow still manages to performs requests, the request will fail at the Repo layer (because it had its state reset).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@renatomassaro, understood. Keeping the assertion as a safeguard makes sense in this case. Thank you for the clarification.

Comment on lines +249 to +250
:ok = Repo.notify_release(pid)

Copy link

@coderabbitai coderabbitai bot Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle potential errors from Repo.notify_release/1

Currently, the code assumes that Repo.notify_release(pid) will always return :ok. If this function fails or returns an error, it will cause a MatchError at runtime. To prevent unexpected crashes, consider handling potential errors.

Apply this diff to handle errors more gracefully:

- :ok = Repo.notify_release(pid)
+ case Repo.notify_release(pid) do
+   :ok ->
+     :ok
+   {:error, reason} ->
+     Logger.error("Failed to notify repo release: #{inspect(reason)}")
+     # Handle the error appropriately, such as retrying or proceeding without the notification
+ end

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is on purpose. If we are unable to release the Repo connection for any reason, kill everything and start fresh because something is wrong.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@renatomassaro Understood. Thank you for the clarification.


✏️ Learnings added
Learnt from: renatomassaro
PR: renatomassaro/FeebDB#15
File: lib/feeb/db/repo/manager.ex:249-250
Timestamp: 2024-11-28T10:08:23.535Z
Learning: In `lib/feeb/db/repo/manager.ex`, within the `do_release_connection/2` function, the line `:ok = Repo.notify_release(pid)` is intentionally designed to match `:ok`, ensuring that if `Repo.notify_release/1` fails, a `MatchError` is raised to trigger a crash and restart of the system. This is by design to handle critical errors, and handling potential errors from `Repo.notify_release/1` is not desired.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)
test/db/schema_test.exs (1)

Line range hint 113-118: LGTM! Consider adding write mode test cases.

The change from plain map to DB.Repo.RepoConfig struct improves type safety and connection tracking, which aligns well with the PR's goal of better connection management. The struct's fields effectively capture essential connection metadata including context, shard, mode, and path.

Consider adding test cases that verify the repo_config virtual field with write mode connections to ensure complete test coverage of connection modes. Example:

test "virtual fields with write mode connections", %{shard_id: shard_id} do
  DB.begin(@context, shard_id, :write)
  
  friend = DB.one({:friends, :get_by_name}, "Ross")
  
  expected_repo_config = %DB.Repo.RepoConfig{
    context: @context,
    shard_id: shard_id,
    mode: :write,
    path: DB.Repo.get_path(@context, shard_id)
  }
  
  assert friend.repo_config == expected_repo_config
end
test/db/repo_test.exs (1)

204-255: LGTM: Comprehensive test coverage for connection release management

Excellent test coverage for the new connection release management functionality. The tests effectively verify:

  • Regular lifecycle behavior
  • Transaction rollback on forced release
  • Manager authorization checks

Consider adding these additional test scenarios:

  1. Concurrent release attempts from multiple processes
  2. Release during long-running transactions
  3. Release with pending queries

Would you like me to help implement these additional test scenarios?

lib/feeb/db/repo.ex (2)

31-36: Enhance documentation with usage examples.

While the documentation explains the purpose, it would be more helpful to include:

  • When exactly this function should be called
  • Example usage scenarios
  • Expected behavior when called during an active transaction

156-157: Consider adding success logging for connection releases.

Adding a log message when successfully resetting the state would help with monitoring and debugging connection lifecycle events.

 # Reset the GenServer state so we are ready to serve a new request
+Logger.info("Successfully reset connection state")
 {:reply, :ok, %{state | transaction_id: nil}}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 72d82b8 and 29b1dd5.

📒 Files selected for processing (6)
  • lib/feeb/db/repo.ex (4 hunks)
  • lib/feeb/db/repo/manager.ex (10 hunks)
  • test/db/repo/manager/registry_test.exs (2 hunks)
  • test/db/repo_test.exs (7 hunks)
  • test/db/schema_test.exs (1 hunks)
  • test/support/db/schemas/friend.ex (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • lib/feeb/db/repo/manager.ex
  • test/db/repo/manager/registry_test.exs
🧰 Additional context used
📓 Learnings (2)
lib/feeb/db/repo.ex (1)
Learnt from: renatomassaro
PR: renatomassaro/FeebDB#15
File: lib/feeb/db/repo/manager.ex:249-250
Timestamp: 2024-11-28T10:08:23.649Z
Learning: In `lib/feeb/db/repo/manager.ex`, within the `do_release_connection/2` function, the line `:ok = Repo.notify_release(pid)` is intentionally designed to match `:ok`, ensuring that if `Repo.notify_release/1` fails, a `MatchError` is raised to trigger a crash and restart of the system. This is by design to handle critical errors, and handling potential errors from `Repo.notify_release/1` is not desired.
test/db/repo_test.exs (1)
Learnt from: renatomassaro
PR: renatomassaro/FeebDB#15
File: test/db/repo_test.exs:247-247
Timestamp: 2024-11-28T09:52:22.690Z
Learning: The function `block_forever/0` is defined in `Test.Utils` and is used in tests to keep a process active indefinitely.
🔇 Additional comments (8)
test/support/db/schemas/friend.ex (2)

3-3: LGTM! Clean alias addition

The new alias improves code readability and maintains consistency with the existing aliasing pattern.


22-22: Verify test coverage with the stricter type checking

The addition of pattern matching %DB.Repo.RepoConfig{} improves type safety but may affect existing tests.

Let's verify that all test cases pass the correct config type:

✅ Verification successful

Pattern matching on RepoConfig is properly tested

The verification shows that:

  1. The get_repo_config/2 function is used in test cases with the correct %DB.Repo.RepoConfig{} struct type in test/db/schema_test.exs
  2. The test properly initializes the struct with all required fields: context, shard_id, mode, and path
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential test failures due to incorrect repo config type usage

# Search for test cases using get_repo_config
rg -A 5 "get_repo_config" test/

# Search for RepoConfig struct usage in tests
rg -A 5 "DB\.Repo\.RepoConfig" test/

Length of output: 1456

test/db/repo_test.exs (2)

12-12: LGTM: Consistent initialization parameter updates

The addition of nil as the manager_pid parameter across all initialization calls is consistent and aligns with the new connection management functionality.

Also applies to: 26-26, 51-51, 77-77


96-99: LGTM: Enhanced transaction state verification

Good addition of explicit transaction state verification using direct SQLite calls. This improves test coverage by ensuring the actual database state matches the expected behavior.

Also applies to: 108-112

lib/feeb/db/repo.ex (4)

7-9: LGTM! Well-structured struct definition.

The struct definition properly enforces required fields while allowing transaction_id to be optional, which aligns with the connection management requirements.


Line range hint 40-57: LGTM! Robust initialization with proper error handling.

The initialization properly integrates the manager_pid into the state and maintains comprehensive error handling for database connection failures.


147-158: LGTM! Robust connection release handling.

The implementation correctly:

  • Validates the release signal source
  • Rolls back any active transaction
  • Resets the state for new requests

Note: The sequential message processing of GenServer ensures thread-safety here.


401-410: LGTM! Idiomatic implementation of release signal validation.

The implementation follows Erlang/Elixir best practices:

  • Uses pattern matching for validation
  • Provides clear error messages
  • Includes test-specific handling

@renatomassaro renatomassaro force-pushed the improved-monitoring-of-leased-connections branch from 29b1dd5 to 0a54299 Compare November 28, 2024 11:09
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (7)
lib/feeb/db/repo/repo_config.ex (1)

1-4: Add type specifications for struct fields.

The struct definition is clean and well-organized. Consider adding type specifications to improve code documentation and enable better tooling support.

+  @type context :: atom()
+  @type shard_id :: String.t()
+  @type mode :: :primary | :replica
+  @type path :: String.t()
+
+  @type t :: %__MODULE__{
+    context: context(),
+    shard_id: shard_id(),
+    mode: mode(),
+    path: path()
+  }

   @struct_keys [:context, :shard_id, :mode, :path]
   @enforce_keys @struct_keys
   defstruct @struct_keys
lib/feeb/db/repo/manager/repo_entry.ex (4)

1-3: LGTM! Consider adding type specs and field documentation.

The struct definition is well-structured with appropriate fields for tracking connection state. The use of @enforce_keys ensures the busy state is always set.

Consider adding:

 defmodule Feeb.DB.Repo.Manager.RepoEntry do
   @enforce_keys [:busy?]
+  @type t :: %__MODULE__{
+    pid: pid() | nil,           # Repository process ID
+    caller_pid: pid() | nil,    # Process ID of the connection lessee
+    timer_ref: reference() | nil,# Reference for lease timeout timer
+    monitor_ref: reference() | nil,# Reference for process monitor
+    busy?: boolean()            # Whether the connection is currently leased
+  }
   defstruct [:pid, :caller_pid, :timer_ref, :monitor_ref, busy?: false]

5-8: Enhance function documentation.

While the function is correctly implemented, the documentation could be more descriptive about the function's purpose and return value.

Consider expanding the documentation:

   @doc """
-  Creates an initial RepoEntry
+  Creates an initial RepoEntry representing an available connection slot.
+
+  Returns a %RepoEntry{} with busy?: false and all other fields set to nil.
   """

10-14: Add type validation and preserve existing fields.

The function correctly ensures state transition but could be improved in two ways:

  1. Add type validation for repo_pid
  2. Preserve any existing optional fields during the transition

Consider this implementation:

   @doc """
   Set a Repo pid to an initial RepoEntry
   """
+  def on_establish(%__MODULE__{pid: nil, busy?: false} = entry, repo_pid) when is_pid(repo_pid),
-  def on_establish(%__MODULE__{pid: nil, busy?: false}, repo_pid),
-    do: %__MODULE__{pid: repo_pid, busy?: false}
+    do: %{entry | pid: repo_pid}

1-43: Consider adding explicit error handling and resource cleanup.

While the module correctly implements the state machine for repository entries, consider these architectural improvements:

  1. Add explicit error handling for invalid state transitions
  2. Ensure proper cleanup of timer and monitor references when transitioning states
  3. Consider adding logging for state transitions to aid in debugging

This will help prevent resource leaks and make the system more maintainable.

lib/feeb/db/repo.ex (1)

156-157: Consider adding success logging for connection releases.

Adding a log message when successfully resetting the state would help with monitoring and debugging connection lifecycle events.

 # Reset the GenServer state so we are ready to serve a new request
+Logger.info("Successfully reset connection state")
 {:reply, :ok, %{state | transaction_id: nil}}
lib/feeb/db/repo/manager.ex (1)

291-302: Consider improving error handling in establish_connection

The error handling could be more specific about what went wrong during connection establishment.

Consider this improvement:

 case Repo.start_link({context, shard_id, db_path, mode, self()}) do
   {:ok, repo_pid} ->
     log(:info, "Established and fetched #{mode} connection", state)
     current_repo_spec = Map.fetch!(state, key)
     {:ok, repo_pid, Map.put(state, key, RepoEntry.on_establish(current_repo_spec, repo_pid))}
-  error ->
-    log(:error, "Error creating #{mode} connection: #{inspect(error)}", state)
+  {:error, reason} ->
+    log(:error, "Failed to create #{mode} connection: #{inspect(reason)}", state,
+      error_type: reason, mode: mode)
     {:error, state}
+  unexpected ->
+    log(:error, "Unexpected error creating #{mode} connection: #{inspect(unexpected)}", state,
+      error_type: :unexpected, mode: mode)
+    {:error, state}
 end
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 29b1dd5 and 0a54299.

📒 Files selected for processing (8)
  • lib/feeb/db/repo.ex (4 hunks)
  • lib/feeb/db/repo/manager.ex (10 hunks)
  • lib/feeb/db/repo/manager/repo_entry.ex (1 hunks)
  • lib/feeb/db/repo/repo_config.ex (1 hunks)
  • test/db/repo/manager/registry_test.exs (2 hunks)
  • test/db/repo_test.exs (7 hunks)
  • test/db/schema_test.exs (1 hunks)
  • test/support/db/schemas/friend.ex (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • test/db/repo/manager/registry_test.exs
  • test/db/repo_test.exs
  • test/db/schema_test.exs
  • test/support/db/schemas/friend.ex
🧰 Additional context used
📓 Learnings (2)
lib/feeb/db/repo.ex (2)
Learnt from: renatomassaro
PR: renatomassaro/FeebDB#15
File: lib/feeb/db/repo.ex:144-145
Timestamp: 2024-11-28T10:01:57.103Z
Learning: In `lib/feeb/db/repo.ex`, when working with Erlang/Elixir code, it's idiomatic to use the same parameter name in function pattern matching to assert that parameters are equal. For example, defining `defp assert_release_signal_from_manager!(manager_pid, manager_pid), do: :ok` is acceptable and preferred.
Learnt from: renatomassaro
PR: renatomassaro/FeebDB#15
File: lib/feeb/db/repo/manager.ex:249-250
Timestamp: 2024-11-28T10:08:23.649Z
Learning: In `lib/feeb/db/repo/manager.ex`, within the `do_release_connection/2` function, the line `:ok = Repo.notify_release(pid)` is intentionally designed to match `:ok`, ensuring that if `Repo.notify_release/1` fails, a `MatchError` is raised to trigger a crash and restart of the system. This is by design to handle critical errors, and handling potential errors from `Repo.notify_release/1` is not desired.
lib/feeb/db/repo/manager.ex (2)
Learnt from: renatomassaro
PR: renatomassaro/FeebDB#15
File: lib/feeb/db/repo/manager.ex:249-250
Timestamp: 2024-11-28T10:08:23.649Z
Learning: In `lib/feeb/db/repo/manager.ex`, within the `do_release_connection/2` function, the line `:ok = Repo.notify_release(pid)` is intentionally designed to match `:ok`, ensuring that if `Repo.notify_release/1` fails, a `MatchError` is raised to trigger a crash and restart of the system. This is by design to handle critical errors, and handling potential errors from `Repo.notify_release/1` is not desired.
Learnt from: renatomassaro
PR: renatomassaro/FeebDB#15
File: lib/feeb/db/repo/manager.ex:137-154
Timestamp: 2024-11-28T10:03:22.695Z
Learning: In the `Feeb.DB.Repo.Manager` module (`lib/feeb/db/repo/manager.ex`), it's acceptable to terminate caller processes using `Process.exit(caller_pid, :feebdb_repo_timeout)` in the `handle_info/2` callback when a repository timeout occurs, as the timeout is a contract and the process should die if it exceeds it.
🔇 Additional comments (11)
lib/feeb/db/repo/repo_config.ex (1)

1-17: Consider adding connection monitoring fields.

Given the PR's focus on monitoring leased connections, consider adding fields to track connection state and lease information.

Let's verify if similar fields exist in other modules:

lib/feeb/db/repo.ex (5)

7-9: LGTM! Well-structured struct definition.

The addition of manager_pid to the struct keys and its inclusion in @enforce_keys ensures proper initialization of the repository state.


31-36: LGTM! Well-documented public interface.

The notify_release/1 function provides a clear interface for the Manager to notify the Repo of connection releases, with good documentation explaining its purpose and effects.


Line range hint 40-57: LGTM! Proper state initialization.

The init/1 function correctly initializes the repository state with the manager_pid and properly constructs the repo_config from the complete state.


147-158: LGTM! Robust connection release handling.

The implementation correctly:

  • Validates the release signal source
  • Handles active transactions by rolling them back
  • Resets the state for new requests

The sequential message processing of GenServer ensures thread-safety.


401-410: LGTM! Idiomatic release signal validation.

The implementation uses idiomatic Erlang/Elixir pattern matching for parameter validation, with a clear error message for unauthorized release attempts. The special case for the test environment is properly isolated.

lib/feeb/db/repo/manager.ex (5)

43-61: LGTM! Well-structured module configuration

The module attributes and struct definitions are well-organized, with clear timeout thresholds and proper enforcement of struct keys.


141-204: LGTM! Robust timeout and process monitoring implementation

The implementation properly handles connection timeouts and process monitoring:

  • Terminates processes that exceed the repo timeout (intentional design)
  • Releases connections when processes die
  • Includes appropriate warning logs for normal process termination without connection release

247-257: LGTM! Thorough connection cleanup implementation

The connection release logic includes all necessary cleanup steps:

  • Stops process monitoring
  • Cancels timeout timer
  • Notifies repo of release (with intentional crash on failure)

Line range hint 380-401: LGTM! Well-structured timer management

The timer management functions are well-implemented with:

  • Clear separation between repo and queue timeouts
  • Proper handling of infinite timeouts (:no_timer)
  • Clean timer cancellation logic

294-294: Address the REVIEW comment regarding GenServer linking

The comment "Do I really want to link both genservers?" needs to be addressed. Consider the implications of linking:

  • Pros: Ensures both processes fail together, maintaining consistency
  • Cons: May cause cascading failures
✅ Verification successful

Let me gather more information about the supervision and linking strategy.


Let me check one more thing about the process linking and monitoring.


Keep the GenServer linking as it ensures proper cleanup and crash propagation

The current linking strategy is appropriate because:

  1. The Repo process is supervised by a DynamicSupervisor with :one_for_one strategy
  2. The Manager monitors client processes to handle cleanup on client crashes
  3. The link between Manager and Repo ensures database connections are properly closed if either process crashes

The REVIEW comment can be removed as the current implementation follows Elixir's best practices for process supervision and error handling.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if there are any crash handling or supervision strategies in place
rg -A 5 "use Supervisor|use DynamicSupervisor"

Length of output: 426


Script:

#!/bin/bash
# Check the supervisor implementation and child spec
cat lib/feeb/db/repo/manager/supervisor.ex

# Check how the Repo process is started and supervised
ast-grep --pattern 'def start_link($$$)'

# Look for restart strategies
rg -A 5 "restart:|strategy:"

Length of output: 1900


Script:

#!/bin/bash
# Check for process monitoring and linking patterns
rg -A 5 "Process\.monitor|Process\.link"

# Check the Repo module implementation for crash handling
cat lib/feeb/db/repo.ex

Length of output: 15786

Comment on lines +6 to +16
@doc """
Builds the initial (and only) RepoConfig from the Repo state
"""
def from_state(state) do
%__MODULE__{
context: state.context,
shard_id: state.shard_id,
mode: state.mode,
path: state.path
}
end
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance documentation and add input validation.

The function implementation is correct but could benefit from improved documentation and validation.

   @doc """
   Builds the initial (and only) RepoConfig from the Repo state
+
+  ## Parameters
+    * `state` - A map containing repository state with the following required fields:
+      * `:context` - The repository context
+      * `:shard_id` - The shard identifier
+      * `:mode` - The repository mode (:primary or :replica)
+      * `:path` - The repository path
+
+  ## Returns
+    * `%RepoConfig{}` - A new repository configuration struct
+
+  ## Examples
+      iex> state = %{context: :user, shard_id: "1", mode: :primary, path: "/db/1"}
+      iex> RepoConfig.from_state(state)
+      %RepoConfig{context: :user, shard_id: "1", mode: :primary, path: "/db/1"}
   """
+  @spec from_state(map()) :: t()
   def from_state(state) do
+    unless Enum.all?(@struct_keys, &Map.has_key?(state, &1)) do
+      raise ArgumentError, "state must contain all required fields: #{inspect(@struct_keys)}"
+    end
+
     %__MODULE__{
       context: state.context,
       shard_id: state.shard_id,

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +16 to +27
@doc """
Lease a Repo (in an established RepoEntry) to a particular caller
"""
def on_acquire(%__MODULE__{pid: pid, busy?: false}, {caller_pid, monitor_ref, timer_ref}) do
%__MODULE__{
pid: pid,
busy?: true,
caller_pid: caller_pid,
monitor_ref: monitor_ref,
timer_ref: timer_ref
}
end
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add type validation and preserve existing fields.

While the function implements the leasing logic correctly, it should validate inputs and preserve any existing fields not related to the lease state.

Consider this implementation:

   @doc """
   Lease a Repo (in an established RepoEntry) to a particular caller
   """
+  def on_acquire(%__MODULE__{pid: pid, busy?: false} = entry, {caller_pid, monitor_ref, timer_ref})
+      when is_pid(pid) and is_pid(caller_pid) and is_reference(monitor_ref) and is_reference(timer_ref) do
-  def on_acquire(%__MODULE__{pid: pid, busy?: false}, {caller_pid, monitor_ref, timer_ref}) do
-    %__MODULE__{
-      pid: pid,
-      busy?: true,
-      caller_pid: caller_pid,
-      monitor_ref: monitor_ref,
-      timer_ref: timer_ref
-    }
+    %{entry |
+      busy?: true,
+      caller_pid: caller_pid,
+      monitor_ref: monitor_ref,
+      timer_ref: timer_ref}
   end
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@doc """
Lease a Repo (in an established RepoEntry) to a particular caller
"""
def on_acquire(%__MODULE__{pid: pid, busy?: false}, {caller_pid, monitor_ref, timer_ref}) do
%__MODULE__{
pid: pid,
busy?: true,
caller_pid: caller_pid,
monitor_ref: monitor_ref,
timer_ref: timer_ref
}
end
@doc """
Lease a Repo (in an established RepoEntry) to a particular caller
"""
def on_acquire(%__MODULE__{pid: pid, busy?: false} = entry, {caller_pid, monitor_ref, timer_ref})
when is_pid(pid) and is_pid(caller_pid) and is_reference(monitor_ref) and is_reference(timer_ref) do
%{entry |
busy?: true,
caller_pid: caller_pid,
monitor_ref: monitor_ref,
timer_ref: timer_ref}
end

Comment on lines +29 to +42
@doc """
Release the Repo in a leased RepoEntry
"""
def on_release(%__MODULE__{pid: repo_pid}) do
%__MODULE__{
pid: repo_pid,
busy?: false
}
end

@doc """
Remove the Repo from an initial, established or released RepoEntry.
"""
def on_close(%__MODULE__{pid: _, busy?: false}), do: on_start()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

🛠️ Refactor suggestion

Add state validation and cleanup of monitoring references

The review comment's suggestions are valid and important. The code shows that RepoEntry maintains timer_ref, monitor_ref, and caller_pid fields that are set during on_acquire but not properly cleaned up during on_release. The test files confirm these fields need proper cleanup.

Consider this implementation:

def on_release(%__MODULE__{pid: pid, busy?: true} = entry) when is_pid(pid) do
  %{entry | 
    busy?: false,
    caller_pid: nil,
    monitor_ref: nil,
    timer_ref: nil}
end

def on_close(%__MODULE__{pid: pid, busy?: false} = _entry) when is_pid(pid), do: on_start()

The changes:

  1. Add guard to ensure valid pid and state in both functions
  2. For on_release:
    • Verify entry is actually leased (busy?: true)
    • Clear all lease-related fields (timer_ref, monitor_ref, caller_pid)
    • Use struct update syntax for efficiency
  3. For on_close:
    • Keep existing behavior but add pid validation
🔗 Analysis chain

Add state validation and improve implementation efficiency.

Both functions should validate their input states more strictly and update existing structs rather than creating new ones.

Consider these implementations:

   @doc """
   Release the Repo in a leased RepoEntry
   """
-  def on_release(%__MODULE__{pid: repo_pid}) do
-    %__MODULE__{
-      pid: repo_pid,
-      busy?: false
-    }
+  def on_release(%__MODULE__{pid: pid, busy?: true} = entry) when is_pid(pid) do
+    %{entry |
+      busy?: false,
+      caller_pid: nil,
+      monitor_ref: nil,
+      timer_ref: nil}
   end

   @doc """
   Remove the Repo from an initial, established or released RepoEntry.
   """
-  def on_close(%__MODULE__{pid: _, busy?: false}), do: on_start()
+  def on_close(%__MODULE__{pid: pid, busy?: false} = _entry) when is_pid(pid), do: on_start()

The changes:

  1. Add guards to ensure valid states and types
  2. For on_release:
    • Verify the entry is actually leased (busy?: true)
    • Clear all lease-related fields
  3. For on_close:
    • Verify we have a valid pid to close

Let's verify the connection cleanup behavior:


The test files and implementation show that timer_ref and monitor_ref are indeed important fields that need proper cleanup during release. Let's verify the complete struct definition and the on_acquire implementation to ensure our suggested changes align with the codebase's design.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for cleanup of timer and monitor references
rg -A 5 "timer_ref|monitor_ref" --type elixir

Length of output: 8147


Script:

#!/bin/bash
# Get the complete RepoEntry module implementation
ast-grep --pattern 'defmodule $_ do
  $$$
end' lib/feeb/db/repo/manager/repo_entry.ex

Length of output: 2963

@renatomassaro renatomassaro merged commit e734106 into main Nov 30, 2024
5 checks passed
@renatomassaro renatomassaro deleted the improved-monitoring-of-leased-connections branch November 30, 2024 00:01
renatomassaro added a commit to HackerExperience/HackerExperience that referenced this pull request Dec 1, 2024
This was a bug uncovered by the warnings added by FeebDB on PR
renatomassaro/FeebDB#15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant