-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[indexer-alt] add transaction indices pipelines #19953
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
3 Skipped Deployments
|
@@ -0,0 +1,39 @@ | |||
CREATE TABLE IF NOT EXISTS tx_affected_addresses ( | |||
tx_sequence_number BIGINT NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's our latest thought on NOT NULL? I think all the columns here are truly non nullable so I went with NOT NULL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think NOT NULL
is fine for key columns (which are all the columns in the filter tables). In my PRs, I omitted the NOT NULL
constraints on the columns that are part of the primary key, because being part of the primary key implies NOT NULL
, but I'm happy to go back and change that -- I think not having the explicit NOT NULL
was confusing in its own way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think having explicit NOT NULL
even for primary keys would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks @emmazzz, just nits!
@@ -0,0 +1,39 @@ | |||
CREATE TABLE IF NOT EXISTS tx_affected_addresses ( | |||
tx_sequence_number BIGINT NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think NOT NULL
is fine for key columns (which are all the columns in the filter tables). In my PRs, I omitted the NOT NULL
constraints on the columns that are part of the primary key, because being part of the primary key implies NOT NULL
, but I'm happy to go back and change that -- I think not having the explicit NOT NULL
was confusing in its own way.
crates/sui-indexer-alt/migrations/2024-10-21-003426_tx_indices/up.sql
Outdated
Show resolved
Hide resolved
crates/sui-indexer-alt/migrations/2024-10-21-003426_tx_indices/up.sql
Outdated
Show resolved
Hide resolved
#[derive(Debug, Clone)] | ||
pub enum TxKind { | ||
SystemTransaction = 0, | ||
ProgrammableTransaction = 1, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This should be under models::transactions
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I wonder whether you could just store the TxKind
directly, if you give it the correct repr
? (I have no idea, genuinely curious whether diesel would pick that up).
#[derive(Debug, Clone)] | |
pub enum TxKind { | |
SystemTransaction = 0, | |
ProgrammableTransaction = 1, | |
} | |
#[derive(Debug, Clone)] | |
#[repr(i16)] | |
pub enum TxKind { | |
SystemTransaction = 0, | |
ProgrammableTransaction = 1, | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repr
does not seem to do much -- it only tells Rust how to store this enum in memory but casting is still necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that we need to cast it if it's going into an i16
type, but I was wondering whether diesel
would do something clever if in the model type, you used TxKind
directly, and set its repr
to i16
.
@@ -64,6 +72,30 @@ diesel::table! { | |||
} | |||
} | |||
|
|||
diesel::table! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious is this now in sync with migrations and thus can be generated by diesel cmd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is generated using crates/sui-indexer-alt/generate_schema.sh
and CI will complain if it mismatches.
CREATE TABLE IF NOT EXISTS tx_digests ( | ||
tx_digest BYTEA PRIMARY KEY, | ||
tx_sequence_number BIGINT NOT NULL | ||
); | ||
CREATE INDEX IF NOT EXISTS tx_digests_tx_sequence_number_index ON tx_digests (tx_sequence_number); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realised something today -- because kv_transactions
is now keyed by tx_digest
this table should actually go in the opposite direction: It should be keyed by tx_sequence_number
so that we can map sequence numbers to digests.
It's also a valid question whether we need this table at all. I.e. can we use tx_digest
everywhere as the key for transactions. @emmazzz -- could you estimate what the size difference is between both approaches to decide which option is best?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realised something today -- because kv_transactions is now keyed by tx_digest this table should actually go in the opposite direction: It should be keyed by tx_sequence_number so that we can map sequence numbers to digests.
This part makes sense to me. When the user input is a list of transaction digests, we look them up directly in kv_transactions
. And when user queries transactions via optional filters, we first look up the transaction sequence numbers using transaction indices, then use tx_digests
to map the sequence numbers to the corresponding digests, and then finally look them up in kv_transactions
via the digests. So that's three tables touched.
Writing this out, I see why you are thinking if we can use digest everywhere as this will allow us to remove one layer of indirection. But even if we use transaction digest everywhere, we will still need to store transaction sequence number everywhere as well for cursor and scan limit. That is one extra column (digest) for each row in five tables (affected objects + addresses, tx kinds, balance changes, tx calls fun), not to mention the indices we need to add here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we can punt on this discussion right now and merge this PR in. I've created a linear task for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes good point on why we still want the sequence number. Indeed, happy to punt on whether we need to add tx_digest
everywhere (and get rid of the explicit tx_digests
table).
crates/sui-indexer-alt/migrations/2024-10-21-003426_tx_indices/up.sql
Outdated
Show resolved
Hide resolved
); | ||
CREATE INDEX IF NOT EXISTS tx_digests_tx_sequence_number_index ON tx_digests (tx_sequence_number); | ||
|
||
CREATE TABLE IF NOT EXISTS tx_kinds ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what is the usage for this table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GraphQL allows querying transactions filtered by transaction kind (system or programmable transaction). This table is used for that. And iirc performance investigation earlier this year determined that an index table would be needed to support such query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(It's also a relatively small table: 170GB unpruned, or, 0.43% of the total footprint).
crates/sui-indexer-alt/migrations/2024-10-21-003426_tx_indices/up.sql
Outdated
Show resolved
Hide resolved
bb5d113
to
6de71b3
Compare
## Description This PR adds schema and pipeline for various transaction indices including - tx_affected_addresses (replacing tx_senders and tx_recipients) - tx_kinds - tx_digests - tx_calls_fun (collapsing into one) ## Test plan cargo run -- \ --database-url "postgres://postgres:postgrespw@localhost:5432/sui_indexer_alt" \ --remote-store-url https://checkpoints.mainnet.sui.io \ --last-checkpoint 10000 --- ## Release notes Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [ ] Indexer: - [ ] JSON-RPC: - [ ] GraphQL: - [ ] CLI: - [ ] Rust SDK: - [ ] REST API: --------- Co-authored-by: Ashok Menon <[email protected]>
Description
This PR adds schema and pipeline for various transaction indices including
Test plan
cargo run -- \
--database-url "postgres://postgres:postgrespw@localhost:5432/sui_indexer_alt"
--remote-store-url https://checkpoints.mainnet.sui.io
--last-checkpoint 10000
Release notes
Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.