-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
indexer-alt: epochs pipelines #20150
base: amnn/idx-config
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
3 Skipped Deployments
|
-- Exclusive transaction upperbound of the epoch. | ||
tx_hi BIGINT NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternatively or additionally, at epoch boundary we also know the tx_lo
because there must be at least 1 tx in the epoch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand -- how do we figure out the tx_lo for the epoch without reading from the DB?
tx_hi: checkpoint_summary.network_total_transactions as i64, | ||
end_timestamp_ms: checkpoint_summary.timestamp_ms as i64, | ||
|
||
safe_mode: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see, so when checkpoint has end of epoch data but no corresponding event in txs, then we're in safe mode
total_stake: None, | ||
storage_fund_balance: None, | ||
storage_fund_reinvestment: None, | ||
storage_charge: None, | ||
storage_rebate: None, | ||
stake_subsidy_amount: None, | ||
total_gas_fees: None, | ||
total_stake_rewards_distributed: None, | ||
leftover_storage_fund_inflow: None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about a default or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no such thing as a "Default" ending of an epoch -- we shouldn't use Default
(only) because we're tired of writing out fields, the type needs to have a reasonable notion of a "default value".
## Description Adds two tables and pipelines: `kv_epoch_starts` and `kv_epoch_ends`, to index epoch information. This pipeline is different from epoch indexing in `sui-indexer` in a number of ways: - It is an append-only pipeline. The columns that are written at the start and end of the epoch are split into two separate tables that can be written to concurrently (by separate pipelines). - The first row of `kv_epoch_starts` is written by the bootstrap process which seeds the `kv_genesis` table as well, this avoids having to condition on whether the checkpoint being processed is the genesis checkpoint in the main pipeline. - Instead of indexing the number of transactions in the epoch, it tracks the transaction high watermark -- readers will need to read the records to calculate the number of total transactions (this avoids having to read the last epoch's total transactions in the write path). - We index the `SuiSystemState` object as BCS, rather than the summary structure. - We explicitly record whether the epoch advancement at the end of the epoch triggered safe mode (the system state object also tracks whether the epoch was started in safe mode). - Fields related to information that came from `SystemEpochInfoEvent` have all been consolidated in `kv_epoch_ends`, and they are all optional, in case of safe mode. It's worth elaborating on the last bullet point, because this is quite a subtle, but large change: - Today, `total_stake` and `storage_fund_balance` are written at the start of an epoch based on the fields of the `SystemEpochInfoEvent` emitted from the previous epoch, and are `NOT NULL`. - The remaining fields were nullable, but only because they would be written to later, once the epoch was over. This was awkward to work with in a couple of ways: - It meant that for the genesis epoch, we needed to make some numbers up (all zeroes) because we did not have an event to read from. - We had to do something similar if we hit safe mode. - When indexing the start and end of epochs separately, it meant that we had to duplicate work (finding the system epoch info event). By making the fields nullable, and consolidating them in `kv_epoch_ends`, we can simplify the pipelines: - `kv_epoch_starts` and the bootstrapping logic can work purely based on the system state object. - `kv_epoch_ends` can work purely based on the `SystemEpochInfoEvent`, and can leave fields `NULL` if we are in safe mode. In the case of `kv_epoch_starts` we could also have cut down fields to just `epoch`, `cp_lo` and `system_state`. I chose not to do this because the system state is actually quite a large object, and it is beneficial to avoid having to deserialize to answer simpler queries. ## Test plan Ran the indexer on the first 1M checkpoints, and correlated the resulting info in the respective tables from the data that the current indexer produced: ``` sui_indexer_alt=# SELECT epoch, protocol_version, cp_lo, TO_TIMESTAMP(start_timestamp_ms / 1000), reference_gas_price FROM kv_epoch_starts; epoch | protocol_version | cp_lo | to_timestamp | reference_gas_price -------+------------------+--------+------------------------+--------------------- 0 | 4 | 0 | 2023-04-12 18:00:00+01 | 1000 1 | 4 | 9770 | 2023-04-13 18:00:02+01 | 1000 2 | 4 | 85169 | 2023-04-14 18:00:04+01 | 1000 3 | 4 | 161192 | 2023-04-15 18:00:08+01 | 1000 4 | 4 | 237074 | 2023-04-16 18:00:11+01 | 1000 5 | 4 | 314160 | 2023-04-17 18:00:15+01 | 1000 6 | 4 | 391107 | 2023-04-18 18:00:18+01 | 1000 7 | 4 | 467716 | 2023-04-19 18:00:21+01 | 1000 8 | 4 | 544978 | 2023-04-20 18:00:26+01 | 1000 9 | 5 | 621933 | 2023-04-21 18:00:28+01 | 1000 10 | 6 | 699410 | 2023-04-22 18:00:31+01 | 1000 11 | 6 | 777074 | 2023-04-23 18:00:34+01 | 1000 12 | 6 | 855530 | 2023-04-24 18:00:36+01 | 1000 13 | 6 | 933559 | 2023-04-25 18:00:39+01 | 1000 (14 rows) sui_indexer_alt=# SELECT epoch, cp_hi, tx_hi, TO_TIMESTAMP(end_timestamp_ms / 1000), safe_mode, storage_fund_balance, storage_fund_reinvestment, storage_charge, storage_rebate, stake_subsidy_amount, total_gas_fees, total_stake_rewards_distributed, leftover_storage_fund_inflow FROM kv_epoch_ends ORDER BY epoch ASC; epoch | cp_hi | tx_hi | to_timestamp | safe_mode | storage_fund_balance | storage_fund_reinvestment | storage_charge | storage_rebate | stake_subsidy_amount | total_gas_fees | total_stake_rewards_distributed | leftover_storage_fund_inflow -------+--------+--------+------------------------+-----------+----------------------+---------------------------+----------------+----------------+----------------------+----------------+---------------------------------+------------------------------ 0 | 9770 | 9771 | 2023-04-13 18:00:02+01 | f | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 1 | 85169 | 85174 | 2023-04-14 18:00:04+01 | f | 2973880 | 0 | 3952000 | 978120 | 0 | 102000000 | 102000000 | 0 2 | 161192 | 161199 | 2023-04-15 18:00:08+01 | f | 717398960 | 0 | 715403200 | 978120 | 0 | 1000000 | 1000000 | 0 3 | 237074 | 237084 | 2023-04-16 18:00:11+01 | f | 733657184 | 0 | 1430198400 | 1413940176 | 0 | 2000000 | 2000000 | 0 4 | 314160 | 314171 | 2023-04-17 18:00:15+01 | f | 733657184 | 0 | 0 | 0 | 0 | 0 | 0 | 0 5 | 391107 | 391119 | 2023-04-18 18:00:18+01 | f | 733657184 | 0 | 0 | 0 | 0 | 0 | 0 | 0 6 | 467716 | 467730 | 2023-04-19 18:00:21+01 | f | 735633184 | 0 | 1976000 | 0 | 0 | 1000000 | 1000000 | 0 7 | 544978 | 544994 | 2023-04-20 18:00:26+01 | f | 729859616 | 0 | 702475600 | 708249168 | 0 | 1000000 | 1000000 | 0 8 | 621933 | 621950 | 2023-04-21 18:00:28+01 | f | 729859616 | 0 | 0 | 0 | 0 | 0 | 0 | 0 9 | 699410 | 699428 | 2023-04-22 18:00:31+01 | f | 729859616 | 0 | 0 | 0 | 0 | 0 | 0 | 0 10 | 777074 | 777093 | 2023-04-23 18:00:34+01 | f | 729859616 | 0 | 0 | 0 | 0 | 0 | 0 | 0 11 | 855530 | 855550 | 2023-04-24 18:00:36+01 | f | 729859616 | 0 | 0 | 0 | 0 | 0 | 0 | 0 12 | 933559 | 933586 | 2023-04-25 18:00:39+01 | f | 735866656 | 0 | 13832000 | 7824960 | 0 | 6000000 | 6000000 | 0 (13 rows) ```
Description
Adds two tables and pipelines:
kv_epoch_starts
andkv_epoch_ends
, to index epoch information. This pipeline is different from epoch indexing insui-indexer
in a number of ways:kv_epoch_starts
is written by the bootstrap process which seeds thekv_genesis
table as well, this avoids having to condition on whether the checkpoint being processed is the genesis checkpoint in the main pipeline.SuiSystemState
object as BCS, rather than the summary structure.SystemEpochInfoEvent
have all been consolidated inkv_epoch_ends
, and they are all optional, in case of safe mode.It's worth elaborating on the last bullet point, because this is quite a subtle, but large change:
total_stake
andstorage_fund_balance
are written at the start of an epoch based on the fields of theSystemEpochInfoEvent
emitted from the previous epoch, and areNOT NULL
.This was awkward to work with in a couple of ways:
By making the fields nullable, and consolidating them in
kv_epoch_ends
, we can simplify the pipelines:kv_epoch_starts
and the bootstrapping logic can work purely based on the system state object.kv_epoch_ends
can work purely based on theSystemEpochInfoEvent
, and can leave fieldsNULL
if we are in safe mode.In the case of
kv_epoch_starts
we could also have cut down fields to justepoch
,cp_lo
andsystem_state
. I chose not to do this because the system state is actually quite a large object, and it is beneficial to avoid having to deserialize to answer simpler queries.Test plan
Ran the indexer on the first 1M checkpoints, and correlated the resulting info in the respective tables from the data that the current indexer produced:
Stack
Release notes
Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.