Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

indexer-alt: epochs pipelines #20150

Open
wants to merge 1 commit into
base: amnn/idx-config
Choose a base branch
from
Open

indexer-alt: epochs pipelines #20150

wants to merge 1 commit into from

Conversation

amnn
Copy link
Member

@amnn amnn commented Nov 2, 2024

Description

Adds two tables and pipelines: kv_epoch_starts and kv_epoch_ends, to index epoch information. This pipeline is different from epoch indexing in sui-indexer in a number of ways:

  • It is an append-only pipeline. The columns that are written at the start and end of the epoch are split into two separate tables that can be written to concurrently (by separate pipelines).
  • The first row of kv_epoch_starts is written by the bootstrap process which seeds the kv_genesis table as well, this avoids having to condition on whether the checkpoint being processed is the genesis checkpoint in the main pipeline.
  • Instead of indexing the number of transactions in the epoch, it tracks the transaction high watermark -- readers will need to read the records to calculate the number of total transactions (this avoids having to read the last epoch's total transactions in the write path).
  • We index the SuiSystemState object as BCS, rather than the summary structure.
  • We explicitly record whether the epoch advancement at the end of the epoch triggered safe mode (the system state object also tracks whether the epoch was started in safe mode).
  • Fields related to information that came from SystemEpochInfoEvent have all been consolidated in kv_epoch_ends, and they are all optional, in case of safe mode.

It's worth elaborating on the last bullet point, because this is quite a subtle, but large change:

  • Today, total_stake and storage_fund_balance are written at the start of an epoch based on the fields of the SystemEpochInfoEvent emitted from the previous epoch, and are NOT NULL.
  • The remaining fields were nullable, but only because they would be written to later, once the epoch was over.

This was awkward to work with in a couple of ways:

  • It meant that for the genesis epoch, we needed to make some numbers up (all zeroes) because we did not have an event to read from.
  • We had to do something similar if we hit safe mode.
  • When indexing the start and end of epochs separately, it meant that we had to duplicate work (finding the system epoch info event).

By making the fields nullable, and consolidating them in kv_epoch_ends, we can simplify the pipelines:

  • kv_epoch_starts and the bootstrapping logic can work purely based on the system state object.
  • kv_epoch_ends can work purely based on the SystemEpochInfoEvent, and can leave fields NULL if we are in safe mode.

In the case of kv_epoch_starts we could also have cut down fields to just epoch, cp_lo and system_state. I chose not to do this because the system state is actually quite a large object, and it is beneficial to avoid having to deserialize to answer simpler queries.

Test plan

Ran the indexer on the first 1M checkpoints, and correlated the resulting info in the respective tables from the data that the current indexer produced:

sui_indexer_alt=# SELECT epoch, protocol_version, cp_lo, TO_TIMESTAMP(start_timestamp_ms / 1000), reference_gas_price FROM kv_epoch_starts;
 epoch | protocol_version | cp_lo  |      to_timestamp      | reference_gas_price
-------+------------------+--------+------------------------+---------------------
     0 |                4 |      0 | 2023-04-12 18:00:00+01 |                1000
     1 |                4 |   9770 | 2023-04-13 18:00:02+01 |                1000
     2 |                4 |  85169 | 2023-04-14 18:00:04+01 |                1000
     3 |                4 | 161192 | 2023-04-15 18:00:08+01 |                1000
     4 |                4 | 237074 | 2023-04-16 18:00:11+01 |                1000
     5 |                4 | 314160 | 2023-04-17 18:00:15+01 |                1000
     6 |                4 | 391107 | 2023-04-18 18:00:18+01 |                1000
     7 |                4 | 467716 | 2023-04-19 18:00:21+01 |                1000
     8 |                4 | 544978 | 2023-04-20 18:00:26+01 |                1000
     9 |                5 | 621933 | 2023-04-21 18:00:28+01 |                1000
    10 |                6 | 699410 | 2023-04-22 18:00:31+01 |                1000
    11 |                6 | 777074 | 2023-04-23 18:00:34+01 |                1000
    12 |                6 | 855530 | 2023-04-24 18:00:36+01 |                1000
    13 |                6 | 933559 | 2023-04-25 18:00:39+01 |                1000
(14 rows)

sui_indexer_alt=# SELECT epoch, cp_hi, tx_hi, TO_TIMESTAMP(end_timestamp_ms / 1000), safe_mode, storage_fund_balance, storage_fund_reinvestment, storage_charge, storage_rebate, stake_subsidy_amount, total_gas_fees, total_stake_rewards_distributed, leftover_storage_fund_inflow FROM kv_epoch_ends ORDER BY epoch ASC;
 epoch | cp_hi  | tx_hi  |      to_timestamp      | safe_mode | storage_fund_balance | storage_fund_reinvestment | storage_charge | storage_rebate | stake_subsidy_amount | total_gas_fees | total_stake_rewards_distributed | leftover_storage_fund_inflow
-------+--------+--------+------------------------+-----------+----------------------+---------------------------+----------------+----------------+----------------------+----------------+---------------------------------+------------------------------
     0 |   9770 |   9771 | 2023-04-13 18:00:02+01 | f         |                    0 |                         0 |              0 |              0 |                    0 |              0 |                               0 |                            0
     1 |  85169 |  85174 | 2023-04-14 18:00:04+01 | f         |              2973880 |                         0 |        3952000 |         978120 |                    0 |      102000000 |                       102000000 |                            0
     2 | 161192 | 161199 | 2023-04-15 18:00:08+01 | f         |            717398960 |                         0 |      715403200 |         978120 |                    0 |        1000000 |                         1000000 |                            0
     3 | 237074 | 237084 | 2023-04-16 18:00:11+01 | f         |            733657184 |                         0 |     1430198400 |     1413940176 |                    0 |        2000000 |                         2000000 |                            0
     4 | 314160 | 314171 | 2023-04-17 18:00:15+01 | f         |            733657184 |                         0 |              0 |              0 |                    0 |              0 |                               0 |                            0
     5 | 391107 | 391119 | 2023-04-18 18:00:18+01 | f         |            733657184 |                         0 |              0 |              0 |                    0 |              0 |                               0 |                            0
     6 | 467716 | 467730 | 2023-04-19 18:00:21+01 | f         |            735633184 |                         0 |        1976000 |              0 |                    0 |        1000000 |                         1000000 |                            0
     7 | 544978 | 544994 | 2023-04-20 18:00:26+01 | f         |            729859616 |                         0 |      702475600 |      708249168 |                    0 |        1000000 |                         1000000 |                            0
     8 | 621933 | 621950 | 2023-04-21 18:00:28+01 | f         |            729859616 |                         0 |              0 |              0 |                    0 |              0 |                               0 |                            0
     9 | 699410 | 699428 | 2023-04-22 18:00:31+01 | f         |            729859616 |                         0 |              0 |              0 |                    0 |              0 |                               0 |                            0
    10 | 777074 | 777093 | 2023-04-23 18:00:34+01 | f         |            729859616 |                         0 |              0 |              0 |                    0 |              0 |                               0 |                            0
    11 | 855530 | 855550 | 2023-04-24 18:00:36+01 | f         |            729859616 |                         0 |              0 |              0 |                    0 |              0 |                               0 |                            0
    12 | 933559 | 933586 | 2023-04-25 18:00:39+01 | f         |            735866656 |                         0 |       13832000 |        7824960 |                    0 |        6000000 |                         6000000 |                            0
(13 rows)

Stack


Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • Indexer:
  • JSON-RPC:
  • GraphQL:
  • CLI:
  • Rust SDK:
  • REST API:

@amnn amnn self-assigned this Nov 2, 2024
Copy link

vercel bot commented Nov 2, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 4, 2024 7:50pm
3 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Nov 4, 2024 7:50pm
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Nov 4, 2024 7:50pm
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Nov 4, 2024 7:50pm

@amnn amnn temporarily deployed to sui-typescript-aws-kms-test-env November 2, 2024 19:55 — with GitHub Actions Inactive
Comment on lines +26 to +27
-- Exclusive transaction upperbound of the epoch.
tx_hi BIGINT NOT NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively or additionally, at epoch boundary we also know the tx_lo because there must be at least 1 tx in the epoch

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand -- how do we figure out the tx_lo for the epoch without reading from the DB?

tx_hi: checkpoint_summary.network_total_transactions as i64,
end_timestamp_ms: checkpoint_summary.timestamp_ms as i64,

safe_mode: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see, so when checkpoint has end of epoch data but no corresponding event in txs, then we're in safe mode

Comment on lines +104 to +112
total_stake: None,
storage_fund_balance: None,
storage_fund_reinvestment: None,
storage_charge: None,
storage_rebate: None,
stake_subsidy_amount: None,
total_gas_fees: None,
total_stake_rewards_distributed: None,
leftover_storage_fund_inflow: None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about a default or something

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no such thing as a "Default" ending of an epoch -- we shouldn't use Default (only) because we're tired of writing out fields, the type needs to have a reasonable notion of a "default value".

## Description

Adds two tables and pipelines: `kv_epoch_starts` and `kv_epoch_ends`, to
index epoch information. This pipeline is different from epoch indexing
in `sui-indexer` in a number of ways:

- It is an append-only pipeline. The columns that are written at the
  start and end of the epoch are split into two separate tables that can
  be written to concurrently (by separate pipelines).
- The first row of `kv_epoch_starts` is written by the bootstrap process
  which seeds the `kv_genesis` table as well, this avoids having to
  condition on whether the checkpoint being processed is the genesis
  checkpoint in the main pipeline.
- Instead of indexing the number of transactions in the epoch, it tracks
  the transaction high watermark -- readers will need to read the
  records to calculate the number of total transactions (this avoids
  having to read the last epoch's total transactions in the write path).
- We index the `SuiSystemState` object as BCS, rather than the summary
  structure.
- We explicitly record whether the epoch advancement at the end of the
  epoch triggered safe mode (the system state object also tracks whether
  the epoch was started in safe mode).
- Fields related to information that came from `SystemEpochInfoEvent`
  have all been consolidated in `kv_epoch_ends`, and they are all
  optional, in case of safe mode.

It's worth elaborating on the last bullet point, because this is quite
a subtle, but large change:

- Today, `total_stake` and `storage_fund_balance` are written at the
  start of an epoch based on the fields of the `SystemEpochInfoEvent`
  emitted from the previous epoch, and are `NOT NULL`.
- The remaining fields were nullable, but only because they would be
  written to later, once the epoch was over.

This was awkward to work with in a couple of ways:

- It meant that for the genesis epoch, we needed to make some numbers up
  (all zeroes) because we did not have an event to read from.
- We had to do something similar if we hit safe mode.
- When indexing the start and end of epochs separately, it meant that we
  had to duplicate work (finding the system epoch info event).

By making the fields nullable, and consolidating them in
`kv_epoch_ends`, we can simplify the pipelines:

- `kv_epoch_starts` and the bootstrapping logic can work purely based on
  the system state object.
- `kv_epoch_ends` can work purely based on the `SystemEpochInfoEvent`,
  and can leave fields `NULL` if we are in safe mode.

In the case of `kv_epoch_starts` we could also have cut down fields to
just `epoch`, `cp_lo` and `system_state`. I chose not to do this because
the system state is actually quite a large object, and it is beneficial
to avoid having to deserialize to answer simpler queries.

## Test plan

Ran the indexer on the first 1M checkpoints, and correlated the
resulting info in the respective tables from the data that the current
indexer produced:

```
sui_indexer_alt=# SELECT epoch, protocol_version, cp_lo, TO_TIMESTAMP(start_timestamp_ms / 1000), reference_gas_price FROM kv_epoch_starts;
 epoch | protocol_version | cp_lo  |      to_timestamp      | reference_gas_price
-------+------------------+--------+------------------------+---------------------
     0 |                4 |      0 | 2023-04-12 18:00:00+01 |                1000
     1 |                4 |   9770 | 2023-04-13 18:00:02+01 |                1000
     2 |                4 |  85169 | 2023-04-14 18:00:04+01 |                1000
     3 |                4 | 161192 | 2023-04-15 18:00:08+01 |                1000
     4 |                4 | 237074 | 2023-04-16 18:00:11+01 |                1000
     5 |                4 | 314160 | 2023-04-17 18:00:15+01 |                1000
     6 |                4 | 391107 | 2023-04-18 18:00:18+01 |                1000
     7 |                4 | 467716 | 2023-04-19 18:00:21+01 |                1000
     8 |                4 | 544978 | 2023-04-20 18:00:26+01 |                1000
     9 |                5 | 621933 | 2023-04-21 18:00:28+01 |                1000
    10 |                6 | 699410 | 2023-04-22 18:00:31+01 |                1000
    11 |                6 | 777074 | 2023-04-23 18:00:34+01 |                1000
    12 |                6 | 855530 | 2023-04-24 18:00:36+01 |                1000
    13 |                6 | 933559 | 2023-04-25 18:00:39+01 |                1000
(14 rows)

sui_indexer_alt=# SELECT epoch, cp_hi, tx_hi, TO_TIMESTAMP(end_timestamp_ms / 1000), safe_mode, storage_fund_balance, storage_fund_reinvestment, storage_charge, storage_rebate, stake_subsidy_amount, total_gas_fees, total_stake_rewards_distributed, leftover_storage_fund_inflow FROM kv_epoch_ends ORDER BY epoch ASC;
 epoch | cp_hi  | tx_hi  |      to_timestamp      | safe_mode | storage_fund_balance | storage_fund_reinvestment | storage_charge | storage_rebate | stake_subsidy_amount | total_gas_fees | total_stake_rewards_distributed | leftover_storage_fund_inflow
-------+--------+--------+------------------------+-----------+----------------------+---------------------------+----------------+----------------+----------------------+----------------+---------------------------------+------------------------------
     0 |   9770 |   9771 | 2023-04-13 18:00:02+01 | f         |                    0 |                         0 |              0 |              0 |                    0 |              0 |                               0 |                            0
     1 |  85169 |  85174 | 2023-04-14 18:00:04+01 | f         |              2973880 |                         0 |        3952000 |         978120 |                    0 |      102000000 |                       102000000 |                            0
     2 | 161192 | 161199 | 2023-04-15 18:00:08+01 | f         |            717398960 |                         0 |      715403200 |         978120 |                    0 |        1000000 |                         1000000 |                            0
     3 | 237074 | 237084 | 2023-04-16 18:00:11+01 | f         |            733657184 |                         0 |     1430198400 |     1413940176 |                    0 |        2000000 |                         2000000 |                            0
     4 | 314160 | 314171 | 2023-04-17 18:00:15+01 | f         |            733657184 |                         0 |              0 |              0 |                    0 |              0 |                               0 |                            0
     5 | 391107 | 391119 | 2023-04-18 18:00:18+01 | f         |            733657184 |                         0 |              0 |              0 |                    0 |              0 |                               0 |                            0
     6 | 467716 | 467730 | 2023-04-19 18:00:21+01 | f         |            735633184 |                         0 |        1976000 |              0 |                    0 |        1000000 |                         1000000 |                            0
     7 | 544978 | 544994 | 2023-04-20 18:00:26+01 | f         |            729859616 |                         0 |      702475600 |      708249168 |                    0 |        1000000 |                         1000000 |                            0
     8 | 621933 | 621950 | 2023-04-21 18:00:28+01 | f         |            729859616 |                         0 |              0 |              0 |                    0 |              0 |                               0 |                            0
     9 | 699410 | 699428 | 2023-04-22 18:00:31+01 | f         |            729859616 |                         0 |              0 |              0 |                    0 |              0 |                               0 |                            0
    10 | 777074 | 777093 | 2023-04-23 18:00:34+01 | f         |            729859616 |                         0 |              0 |              0 |                    0 |              0 |                               0 |                            0
    11 | 855530 | 855550 | 2023-04-24 18:00:36+01 | f         |            729859616 |                         0 |              0 |              0 |                    0 |              0 |                               0 |                            0
    12 | 933559 | 933586 | 2023-04-25 18:00:39+01 | f         |            735866656 |                         0 |       13832000 |        7824960 |                    0 |        6000000 |                         6000000 |                            0
(13 rows)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants