Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[indexer-alt] add transaction indices pipelines #19953

Merged
merged 2 commits into from
Nov 3, 2024
Merged

Conversation

emmazzz
Copy link
Contributor

@emmazzz emmazzz commented Oct 22, 2024

Description

This PR adds schema and pipeline for various transaction indices including

  • tx_affected_addresses (replacing tx_senders and tx_recipients)
  • tx_kinds
  • tx_digests
  • tx_calls_fun (collapsing into one)

Test plan

cargo run -- \
--database-url "postgres://postgres:postgrespw@localhost:5432/sui_indexer_alt"
--remote-store-url https://checkpoints.mainnet.sui.io
--last-checkpoint 10000


Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • Indexer:
  • JSON-RPC:
  • GraphQL:
  • CLI:
  • Rust SDK:
  • REST API:

Copy link

vercel bot commented Oct 22, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 2, 2024 11:29pm
3 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Nov 2, 2024 11:29pm
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Nov 2, 2024 11:29pm
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Nov 2, 2024 11:29pm

@@ -0,0 +1,39 @@
CREATE TABLE IF NOT EXISTS tx_affected_addresses (
tx_sequence_number BIGINT NOT NULL,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's our latest thought on NOT NULL? I think all the columns here are truly non nullable so I went with NOT NULL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think NOT NULL is fine for key columns (which are all the columns in the filter tables). In my PRs, I omitted the NOT NULL constraints on the columns that are part of the primary key, because being part of the primary key implies NOT NULL, but I'm happy to go back and change that -- I think not having the explicit NOT NULL was confusing in its own way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think having explicit NOT NULL even for primary keys would be better.

Copy link
Member

@amnn amnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks @emmazzz, just nits!

@@ -0,0 +1,39 @@
CREATE TABLE IF NOT EXISTS tx_affected_addresses (
tx_sequence_number BIGINT NOT NULL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think NOT NULL is fine for key columns (which are all the columns in the filter tables). In my PRs, I omitted the NOT NULL constraints on the columns that are part of the primary key, because being part of the primary key implies NOT NULL, but I'm happy to go back and change that -- I think not having the explicit NOT NULL was confusing in its own way.

crates/sui-indexer-alt/src/handlers/tx_calls_fun.rs Outdated Show resolved Hide resolved
crates/sui-indexer-alt/src/handlers/tx_digests.rs Outdated Show resolved Hide resolved
crates/sui-indexer-alt/src/handlers/tx_calls_fun.rs Outdated Show resolved Hide resolved
Comment on lines 16 to 20
#[derive(Debug, Clone)]
pub enum TxKind {
SystemTransaction = 0,
ProgrammableTransaction = 1,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This should be under models::transactions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I wonder whether you could just store the TxKind directly, if you give it the correct repr? (I have no idea, genuinely curious whether diesel would pick that up).

Suggested change
#[derive(Debug, Clone)]
pub enum TxKind {
SystemTransaction = 0,
ProgrammableTransaction = 1,
}
#[derive(Debug, Clone)]
#[repr(i16)]
pub enum TxKind {
SystemTransaction = 0,
ProgrammableTransaction = 1,
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repr does not seem to do much -- it only tells Rust how to store this enum in memory but casting is still necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that we need to cast it if it's going into an i16 type, but I was wondering whether diesel would do something clever if in the model type, you used TxKind directly, and set its repr to i16.

crates/sui-indexer-alt/src/main.rs Outdated Show resolved Hide resolved
@@ -64,6 +72,30 @@ diesel::table! {
}
}

diesel::table! {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious is this now in sync with migrations and thus can be generated by diesel cmd?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is generated using crates/sui-indexer-alt/generate_schema.sh and CI will complain if it mismatches.

Comment on lines 11 to 15
CREATE TABLE IF NOT EXISTS tx_digests (
tx_digest BYTEA PRIMARY KEY,
tx_sequence_number BIGINT NOT NULL
);
CREATE INDEX IF NOT EXISTS tx_digests_tx_sequence_number_index ON tx_digests (tx_sequence_number);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realised something today -- because kv_transactions is now keyed by tx_digest this table should actually go in the opposite direction: It should be keyed by tx_sequence_number so that we can map sequence numbers to digests.

It's also a valid question whether we need this table at all. I.e. can we use tx_digest everywhere as the key for transactions. @emmazzz -- could you estimate what the size difference is between both approaches to decide which option is best?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realised something today -- because kv_transactions is now keyed by tx_digest this table should actually go in the opposite direction: It should be keyed by tx_sequence_number so that we can map sequence numbers to digests.

This part makes sense to me. When the user input is a list of transaction digests, we look them up directly in kv_transactions. And when user queries transactions via optional filters, we first look up the transaction sequence numbers using transaction indices, then use tx_digests to map the sequence numbers to the corresponding digests, and then finally look them up in kv_transactions via the digests. So that's three tables touched.

Writing this out, I see why you are thinking if we can use digest everywhere as this will allow us to remove one layer of indirection. But even if we use transaction digest everywhere, we will still need to store transaction sequence number everywhere as well for cursor and scan limit. That is one extra column (digest) for each row in five tables (affected objects + addresses, tx kinds, balance changes, tx calls fun), not to mention the indices we need to add here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we can punt on this discussion right now and merge this PR in. I've created a linear task for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes good point on why we still want the sequence number. Indeed, happy to punt on whether we need to add tx_digest everywhere (and get rid of the explicit tx_digests table).

crates/sui-indexer-alt/src/models/transactions.rs Outdated Show resolved Hide resolved
);
CREATE INDEX IF NOT EXISTS tx_digests_tx_sequence_number_index ON tx_digests (tx_sequence_number);

CREATE TABLE IF NOT EXISTS tx_kinds (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what is the usage for this table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GraphQL allows querying transactions filtered by transaction kind (system or programmable transaction). This table is used for that. And iirc performance investigation earlier this year determined that an index table would be needed to support such query.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It's also a relatively small table: 170GB unpruned, or, 0.43% of the total footprint).

@emmazzz emmazzz disabled auto-merge October 24, 2024 08:21
@amnn amnn temporarily deployed to sui-typescript-aws-kms-test-env November 2, 2024 23:28 — with GitHub Actions Inactive
@amnn amnn merged commit 61b97e6 into main Nov 3, 2024
52 checks passed
@amnn amnn deleted the emma/tx-indices-alt branch November 3, 2024 00:01
lxfind pushed a commit that referenced this pull request Nov 5, 2024
## Description 

This PR adds schema and pipeline for various transaction indices
including
- tx_affected_addresses (replacing tx_senders and tx_recipients)
- tx_kinds
- tx_digests
- tx_calls_fun (collapsing into one)

## Test plan 

cargo run --                                             \          
--database-url
"postgres://postgres:postgrespw@localhost:5432/sui_indexer_alt" \
--remote-store-url https://checkpoints.mainnet.sui.io \
  --last-checkpoint 10000

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [ ] GraphQL: 
- [ ] CLI: 
- [ ] Rust SDK:
- [ ] REST API:

---------

Co-authored-by: Ashok Menon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants