Solver participation guard #3257

squadgazzz · 2025-01-29T15:05:13Z

Description

From the original issue:

When a solver repeatedly wins consecutive auctions but fails to settle its solutions on-chain, it can lead to system downtime. To prevent this, the autopilot must have the capability to temporarily exclude such solvers from participating in competitions. This ensures no single solver can disrupt the system's operations.

This PR implements it by introducing a new struct, which checks whether the solver is allowed to participate in the next competition by using two different approaches:

Moved the existing Authenticator's is_solver on-chain call into the new struct.
Introduced a new strategy, which finds a non-settling solver using a SQL query. It selects 3 last auctions(configurable) with a deadline until the current block to avoid selecting pending settlements and checks if all of the auctions were settled by the same solver/solvers(in case of multiple winners). This strategy caches the results to avoid redundant DB queries. This query relies on the auction_id column from the settlements table, which gets updated separately by the Observer struct, so the cache gets updated only once the Observer has some result.

These validators are called sequentially to avoid redundant RPC calls to Authenticator. So it first checks for the DB-based validator cache and, only then, sends the RPC call.

Once one of the strategies says the solver is not allowed to participate, it gets deny-listed for 5m(configurable).

Each validator can be enabled/disabled separately in case of any issue.

Metrics

Added a metric that gets populated by the DB-based validator once a solver is marked as banned. The idea is to create an alert that is sent if there are more than 4 such occurrences for the last 30 minutes for the same solver, meaning it should be considered disabling the solver.

Open discussions

Since the current SQL query filters out auctions where a deadline has not been reached, the following case is possible:
The solver gets banned, while the same solver has a pending settlement. In case this gets settled, the solver remains banned. While this is a niche case, it would be better to unblock the solver before the cache TTL deadline is reached. This has not been implemented in the current PR since some refactoring is required in the Observer struct. If this is approved, it can be implemented quickly.
Whether it makes sense to introduce a metrics-based strategy similar to the bad token detector's where the solver gets banned in case >95%(or similar) of settlements fail.

How to test

A new SQL query test. Existing e2e tests.

Related Issues

Fixes #3221

MartinquaXD · 2025-01-30T08:39:36Z

crates/autopilot/src/database/competition.rs

+    /// Finds solvers that won `last_auctions_count` consecutive auctions but
+    /// never settled any of them. The current block is used to prevent


I'd say the problem is not exclusive to solvers that win repeatedly. It's worse for the protocol when a failing solver wins repeatedly but conceptually the check should protect any malfunctioning solver from participating in the auction. If it only wins 10% of the cases but fails to settle a majority of them we should also disable that one.

If it only wins 10% of the cases but fails to settle a majority of them we should also disable that one.

Ok, that solves one of the discussion points from the PR description. I'd introduce it in a separate PR then since the current one is already too big.

So, I think the metrics-based approach needs to be implemented in addition to the current SQL query, where the query quickly prevents protocol from being stuck(the original issue states only this problem), while the metrics-based is more about long-term estimation. Does it make sense?

I think the metrics-based approach needs to be implemented in addition to the current SQL query

I think detecting that X% of won and promised solutions of solver S didn't get onchain over the last M minutes should be doable using the DB, no?

I'd introduce it in a separate PR then since the current one is already too big.

Makes sense

last M minutes

If only last M auctions, not minutes. Will test the query since it seems like it would require a much higher auctions range to fetch to build reasonable statistics.

crates/autopilot/src/domain/competition/solver_participation_guard.rs

MartinquaXD · 2025-01-30T08:48:39Z

crates/autopilot/src/run_loop.rs

-                    ?err,
-                    "failed to check if solver is deny listed"
-                );
+        let can_participate = self.solver_participation_guard.can_participate(&driver.submission_address).await.map_err(|err| {


This participation guard needs to be opt-in until there was a CIP enforcing that for every solver.

Ok, this can be achieved by disabling the db-based validator in the config and using this option by default.

I meant it should be opt-in on a solver by solver basis.

MartinquaXD · 2025-01-30T08:51:17Z

crates/autopilot/src/run_loop.rs

+        // Do not send the request to the driver if the solver is deny-listed
+        if !can_participate {


We should either notify the driver on every missed auction or when it gets disabled so that the external team can debug what's wrong immediately.

Implementing the /notify endpoint is external's team responsibly, right? So we would just send a request without expecting all the external solvers implemented it.

An attempt to implement it in a separate PR #3262

crates/autopilot/src/arguments.rs

squadgazzz added 5 commits January 29, 2025 12:27

Solver participation validator

5fe0dd6

Test

5319945

Avoid rpc calls every time

e65c328

Typo

fc3321b

Docs

0fbd61c

squadgazzz changed the title ~~Solver participation validator~~ Solver participation gate Jan 29, 2025

squadgazzz changed the title ~~Solver participation gate~~ Solver participation guard Jan 29, 2025

Metrics

b1abfa0

squadgazzz marked this pull request as ready for review January 29, 2025 17:00

squadgazzz requested a review from a team as a code owner January 29, 2025 17:00

squadgazzz marked this pull request as draft January 29, 2025 17:01

squadgazzz added 2 commits January 29, 2025 17:42

Configurable validators

292dcff

Fixed clap config

fe9ef5b

squadgazzz marked this pull request as ready for review January 29, 2025 18:00

MartinquaXD reviewed Jan 30, 2025

View reviewed changes

squadgazzz added 5 commits January 30, 2025 12:36

Refactoring

c5e3502

Config per solver

a9e6a3f

Start using the new config

9a55fe2

Simplify to hashset

f9bdafd

Nit

5fc831e

squadgazzz force-pushed the blacklist-failing-solvers branch 2 times, most recently from f69e174 to 5fc831e Compare January 30, 2025 20:11

squadgazzz mentioned this pull request Jan 30, 2025

Notify banned solvers #3262

Open

squadgazzz added 5 commits January 31, 2025 15:18

Use driver's name in metrics

3154cd0

Nit

47007c1

Send metrics about each found solver

bb9059e

Cache only accepted solvers

6787d34

Refactoring

a2710c6

squadgazzz mentioned this pull request Jan 31, 2025

Ban solvers based on the settlements success rate #3263

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solver participation guard #3257

Solver participation guard #3257

squadgazzz commented Jan 29, 2025 •

edited

Loading

MartinquaXD Jan 30, 2025

squadgazzz Jan 30, 2025

squadgazzz Jan 30, 2025

MartinquaXD Jan 30, 2025

squadgazzz Jan 30, 2025

MartinquaXD Jan 30, 2025

squadgazzz Jan 30, 2025

MartinquaXD Jan 30, 2025

squadgazzz Jan 30, 2025

MartinquaXD Jan 30, 2025

squadgazzz Jan 30, 2025

squadgazzz Jan 30, 2025

		/// Finds solvers that won `last_auctions_count` consecutive auctions but
		/// never settled any of them. The current block is used to prevent

		// Do not send the request to the driver if the solver is deny-listed
		if !can_participate {

Solver participation guard #3257

Are you sure you want to change the base?

Solver participation guard #3257

Conversation

squadgazzz commented Jan 29, 2025 • edited Loading

Description

Metrics

Open discussions

How to test

Related Issues

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

squadgazzz commented Jan 29, 2025 •

edited

Loading