Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Driver fetches order's app-data #3242

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Conversation

squadgazzz
Copy link
Contributor

@squadgazzz squadgazzz commented Jan 17, 2025

Description

As part of the flashloans support, the driver needs be able to retrieve the order's full app data in order to send flashloans hints to solvers(#3216). This PR introduces an app-data retriever that works as follows:

  • Check whether the app data already exists in the cache and returns it.
  • Otherwise, it sends a corresponding request to the orderbook API to fetch the app-data. The request is sent using BoxRequestSharing, which helps to avoid request duplication.
  • Since the full app data is an optional field in the Order struct, responses with 404 Not Found are also cached as None. There is a small amount of orders with empty full app data in the DB, but they still exist.
  • In case of any app-data fetching error, the order gets discarded, and the error is logged. This is done to avoid discarding the whole auction.

Implementation details and considerations

  • The LRU cache(moka) is used because, according to the mainnet DB, there is only 2% unique app_data among all the orders, so there is no need for a TTL cache since the cache is expected to be hit for the majority of orders.
  • Based on the DB data, the average full app data size is ~800 bytes. The LRU cache has 2000 capacity, approximately equal to ~1.5MB of memory.
  • Once app data is fetched, it needs to be stored in the domain::Order struct. To avoid creating new order structs, the AppData is converted into an enum, which gets updated accordingly.
  • Already discussed with @MartinquaXD. The full app data is cached even though the flashloan part is only expected to be used further. This can be reconsidered either in this or future PRs.

Rate limiting

The following is only valid for colocated solvers since the API is not rate-limited for the services within the same k8s cluster.

SQL query
WITH unique_app_data_per_auction AS (
  SELECT
      oe.auction_id,
      COUNT(DISTINCT o.app_data) AS unique_app_data_count
  FROM
      order_execution oe INNER JOIN orders o ON oe.order_uid = o.uid
  GROUP BY
      oe.auction_id
),
auction_count_by_unique_app_data AS (
  SELECT
      unique_app_data_count,
      COUNT(*) AS auction_count
  FROM
      unique_app_data_per_auction
  GROUP BY
      unique_app_data_count
)
SELECT
  unique_app_data_count as unique_app_data_per_auction,
  auction_count
FROM
  auction_count_by_unique_app_data
ORDER BY
  auction_count DESC;

This image represents the following data: The left column shows how many unique app data entries have a single auction, whereas the right shows how many auctions have the corresponding amount of unique app data.

The orderbook API RPS is 5. Only 0.05% of auctions have more than 5 unique app data entries. That means there is at most a 0.05% chance that a driver hits the RPS for a single auction, where the actual probability is even lower since some of the hashes will more likely be already cached. That is why the current implementation doesn't contain a rate-limiting mechanism.

Changes

  • The app-data retriever.
  • A new required driver cli argument orderbook-url.
  • A mocked orderbook for driver tests that always returns 404. This should be improved in driver informs solver about flashloan hints #3216, where driver tests are expected to be added.

How to test

All the current tests pass. New driver tests can be added only once the driver starts sending the collected data to solvers(#3216).

Related Issues

Fixes #3215

Comment on lines +131 to +132
inner: Arc<Mutex<Inner>>,
app_data_retriever: Arc<order::app_data::AppDataRetriever>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

app_data_retriever doesn't require the internal lock, so it sits separately to be able to execute app-data and balance fetching tasks in parallel.

Comment on lines +38 to +40
// According to statistics, the average size of the app-data is ~800 bytes. With
// this constant, the approximate size of the cache will be ~1.6 MB.
const CACHE_SIZE: u64 = 2_000;
Copy link
Contributor Author

@squadgazzz squadgazzz Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value can be increased or become configurable. The total amount of unique app-data atm is ~97k, while there is 4.3M of orders.

}
}

impl Clone for FetchingError {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required to satisfy BoxRequestSharing constraints.

@squadgazzz squadgazzz marked this pull request as ready for review January 17, 2025 18:54
@squadgazzz squadgazzz requested a review from a team as a code owner January 17, 2025 18:54
Copy link
Contributor

@mstrug mstrug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice and thoughtful implementation. Only couple of comments.

crates/driver/src/domain/competition/auction.rs Outdated Show resolved Hide resolved
crates/driver/src/domain/competition/auction.rs Outdated Show resolved Hide resolved
crates/driver/src/tests/setup/driver.rs Outdated Show resolved Hide resolved
crates/driver/src/tests/setup/driver.rs Outdated Show resolved Hide resolved
crates/driver/src/tests/setup/orderbook.rs Outdated Show resolved Hide resolved
@@ -50,6 +51,42 @@ pub struct Order {
pub quote: Option<Quote>,
}

/// The app data associated with an order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that order/app_data.rs is a thing it seems more fitting to put all these appdata related types there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which types exactly? The whole AppDataRetriever? I ended up creating a separate file for the AppDataRetriever and keeping mod.rs to contain domain structs only because the latter is already quite big and hard to read.

Comment on lines +160 to +161
// Filter out orders that failed to fetch app data.
prioritized_orders.retain_mut(|order| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filtering orders without the full appdata right away seems a bit dangerous. At the moment we don't need the parsed data yet so keeping orders where we weren't able to fetch the full appdata would not be a regression to the status quo.

Also I'm still a bit worried how much time these requests will take even if there aren't so many unique appdata hashes. To avoid bad surprises I'd like to see this feature gated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filtering orders without the full appdata right away seems a bit dangerous.

It filters out orders that only had flashloan data in the order, but for some reason, we didn't receive the full data from the API. I think orders shouldn't be included because they will most likely fail to execute without flashloan hints sent to solvers, right?

To avoid bad surprises I'd like to see this feature gated.

Makes sense. Will add metrics and a config.

/// updated orders.
pub async fn process(&self, auction: Auction, solver: &eth::H160) -> Auction {
let (mut app_data_by_order, mut prioritized_orders) = join(
self.collect_orders_app_data(&auction),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The majority of solvers are connected to a single driver. AuctionProcessor::process() gets called for all of them. Under the hood prioritize_orders() will only ever run once per auction since it turned out that this overhead is too much especially given that all this would be duplicated work.
I think collecting order app data should also be implemented such that it only runs once per auction. Even if it's already using request sharing logic internally I think it would be good to extend the AuctionProcessor code such that one would naturally continue using the mutexed processing introduced in prioritize_orders.

Copy link
Contributor Author

@squadgazzz squadgazzz Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to extend the AuctionProcessor code such that one would naturally continue using the mutexed processing introduced in prioritize_orders.

Using the same mutex means we won't be able to execute both balance filtering and app data collecting in parallel, right? I wanted to avoid that. Nvm, probably, your suggestion is still a better option.

@sunce86
Copy link
Contributor

sunce86 commented Jan 21, 2025

To build a system that is easily extensible and doesn't require every piece of metadata that protocol is supposed to understand to the auction we'll make it the responsibility of the driver to fetch the appdata, parse, and understand it.

@MartinquaXD what is the most important reason not to make app-data part of the auction (I don't fully understand the argument ☝️)?

AFAIS the current solution also has a number of cons:

  1. Consistency - all order data is fetched from auction, but only app data is fetched directly from orderbook API
  2. Complex solution that requires cache, with all the small assumptions that need to be watched and monitored (like the number of orders in db, number of unique app-datas, orderbook rps limit, all of which can change over time)
  3. Requires all fully colocated drivers (running non-reference driver) to re-implement a similar cache to stay competitive on these orders

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

driver fetches flashloan hints for orders
4 participants