archive: Introduce archive_storageDiff for indexers #159

lexnv · 2024-08-01T10:12:37Z

This PR introduces a new unstable method to the archive class.

The method archive_unstable_storageDiff aims to help indexers in providing storage differences between two blocks.

By default, the method returns the difference between currentHash and its parent.

Furthermore, users have the ability to specify the keys of interest:

prefixes: A list of key prefixes under which the users will receive notifications
exclude key prefixes: A list of key prefixes to exclude from the results
childTrie: Useful for computing the difference for a child trie instead of the main storage trie

For each individual prefix, the users can specify if they need:

value: Returns the value of the key from the currentHash block
hash: Similar to value but returns the hash
none: Users are only interested to see if the key was modified, added or removed.

For symmetry with the chainHead functions:

a storageDiffStop method is used to stop the subscription
a storageDiffContinue method is used for resuming storage differences

Closes: #108

cc @paritytech/subxt-team @tomaka @jsdw @josepot

Signed-off-by: Alexandru Vasile <[email protected]>

src/api/archive_unstable_storageDiff.md

src/api/archive_unstable_storageDiffStop.md

src/api/archive_unstable_storageDiffContinue.md

jsdw · 2024-08-05T15:25:47Z

I wonder about the performance implications of eg not providing prefixes (or providing many or large ones), but think that the streaming makes sense as a way of at least mitigating the performance hit of this call on the node! As an API though, off the top of my head it looks solid and feels like it would cover the use cases I can think of!

src/api/archive_unstable_storageDiff.md

tomaka · 2024-08-06T11:25:09Z

src/api/archive_unstable_storageDiff.md

+  - `prefixes` (optional): Array of JSON objects describing how the storage difference will be calculated. Each object contains the following fields:
+    - `key`: String containing the hexadecimal-encoded key prefix under which the storage difference is calculated.
+    - `type`: String equal to one of: `value`, `hash`, `none`.


"Calculating under a prefix" doesn't mean anything to me.

Have added a new sentace to describe this a bit better: " Only the storage entries whose key starts with the provided prefix are returned." , thanks!

src/api/archive_unstable_storageDiff.md

tomaka · 2024-08-06T11:29:02Z

src/api/archive_unstable_storageDiff.md

+    - `type`: String equal to one of: `value`, `hash`, `none`.
+  - containing the key prefixes for which the storage difference will be calculated. If this parameter is not provided, the storage difference is calculated for all keys.
+  - `excludeKeyPrefixes` (optional): Array of strings containing the key prefixes for which the storage difference will not be calculated. If this parameter is not provided, the storage difference is calculated for all keys.
+  - `childTrie` (optional): A string containing the hexadecimal-encoded key of the child trie of the "default" namespace. If this parameter is not provided, the storage difference is calculated for the main storage.


I guess that you just put this parameter here without giving much thoughts to it in order to get child tries out of the way, but it is very incoherent with the rest of the design of the function.
The function lets you query the difference between a range of blocks and many different prefixes, but then you would have to send a separate JSON-RPC request for each child trie one by one? That's extremely weird to me.

Yeap, I wondering as well if that's the right place to put it. Have extended it similar to:

items [ { (optional) prefixes... trieType: mainTrie | childTrie (optional) childTrieKey } ]

This should provide the storage difference of multiple tries.
I've added the nonoptional trieType because it felt odd for users to say:
items: [ { }, { "childKey": ...}] (ie have an empty object for main trie queries)

tomaka · 2024-08-06T11:32:16Z

src/api/archive_unstable_storageDiff.md

+  - `excludeKeyPrefixes` (optional): Array of strings containing the key prefixes for which the storage difference will not be calculated. If this parameter is not provided, the storage difference is calculated for all keys.
+  - `childTrie` (optional): A string containing the hexadecimal-encoded key of the child trie of the "default" namespace. If this parameter is not provided, the storage difference is calculated for the main storage.
+
+**Return value**: String containing an opaque value representing the operation.


For the sake of simplicity, I've been trying to avoid subscriptions for all archive-related functions.
The subscription functions are important for chainHead, because you want be able to cancel operations in order to keep bandwidth usage to a minimum, and because you want to be able to cancel operations related to blocks that end up not being canonical.
However, none of these two things apply here (and don't apply for any archive-prefix function), so I don't really see the point of having a subscription.

I assumed that the point of the subscription here is that calculating the diffs can be quite slow, so by streaming them (there can be a bunch of prefixes asked for) the user can stop them (and be asked to continue) if they find what they need or decide they don't need the server to continue working?

I agree though that it'd be much nicer to avoid the subscriptions though, so if this isn't an issue then it'd be great to simplify it!

if they find what they need or decide they don't need the server to continue working?

In my opinion it is a better idea for the client to send queries progressively, for example query from block 1 to 100, then block 101 to 200, then block 201 to 300, etc.
It is not so different to do so compared to sending a "continue", except that the client has more control.

I do like this approach! Turning this subscription into a method is easier to follow from the user's perspective. I'll do some performance measurements for storage differences once we implement this in Substrate.

We could probably add a disk cache to make queries a bit faster if that turns to be a performance issue

tomaka · 2024-08-06T11:33:52Z

src/api/archive_unstable_storageDiff.md

+
+**Parameters**:
+
+- `currentHash`: String containing a hexadecimal-encoded hash of the header of the block whose storage difference will be retrieved. The storage difference is calculated between the `currentHash` block and the parent of the `currentHash` block.


Also I don't think that the naming is appropriate. What exactly is "current" here, given that the block can be very old?

Yep that makes sense, have applied James suggestion of naming currentHash -> hash and fromHash -> previousHash 🙏

src/api/archive_unstable_storageDiff.md

Signed-off-by: Alexandru Vasile <[email protected]>

jsdw · 2024-08-28T15:11:43Z

src/api/archive_unstable_storageDiff.md

+        {
+            "prefixes": [
+                {
+                    "key": "0x...",
+                    "type": "value" | "hash" | "none",
+                },
+            ],
+
+            "trieType": "mainTrie" | "childTrie",
+            "childTrieKey": "0x...",
+        },


It feels a little off to me that we have two arrays nested inside eachother in order to presumably support child tree access. I wonder whether we could simplify items to be:

"items": [ { "key": "0x...", "returnType": "value" | "hash" | "none", "childTrie": "0x..." | null, } ]

If "childTrie" is provided and not null, then we try accessing said child trie, else we assume we're trying to access the main trie (which I guess is the overwhelming default for most, but not 100% sure!).

I also renamed "type" to "returnType" just because it felt a bit clearer (ie we are not defining the type of the key or anything, just the type of the things returned from this), but happy with either!

Yep that makes sense! While at it, I've made key and childTrie optional

jsdw · 2024-08-28T15:15:56Z

src/api/archive_unstable_storageDiff.md

+The JSON object returned by this function has the following format:
+
+```json
+{


If we did the above, we'd also tweak this to be a single array whose entries contained a "childTrieKey" if one was provided in the input for that key ie

"result": [ { "key": "0x...", "value": "0x...", "hash": "0x...", "type": "added" | "modified" | "deleted", "childTrie": "0x..." | null, }, ]

Ah, I see that "prefixes" is optional. If it's not provided, is the point that this method will provide all chanegs in the whole trie?

So either this suggestion wouldn't allow that, or one could set a key like "0x" (or not provide one) to indicate that we are searching for diffs under everything in that trie? (This would allow for the type to be specified I guess too, which isn't currently possible if no prefixes are given)

Yep, I think the API looks cleaner with this suggestion, I made the key optional instead of 0x 🙏

jsdw · 2024-08-28T15:18:35Z

src/api/archive_unstable_storageDiff.md

+
+The `differences` field is an array of objects describing storage differences. Each element contains the following fields:
+
+- `key`: String containing the hexadecimal-encoded key of the storage entry. If the key prefix was provided in the `items` parameter, the `key` field is guaranteed to start with one of the key prefixes provided.


If the key prefix was provided in the items parameter, the key field is guaranteed to start with one of the key prefixes provided.

What if the key prefix was not provided in the items parameter?

Or perhaps this is aiming for something more like: "String containing the hexadecimal-encoded key of the storage entry. A prefix of this key will have been provided in the items input"? I might not 100% understand!

Yep the suggestion clarifies things, thanks!

src/api/archive_unstable_storageDiff.md

niklasad1

I'm not that familiar with the storage and child storage stuff to understand why the API actually needs to expose whether the storage keys/prefixes is a main or child storage.

It's quite complicated API to use in my opinion but most likely required for a reason that I don't understand but I would be much happier to just have not expose "trieType" and "childTrieKey" and let node just inspect the filter of the storage keys.

"items": [
    {
                "key": "0x1.",
                "type": "value" | "hash" | "none",
    }
    {
                "key": "0x2",
                "type": "value" | "hash" | "none",
    }
]

In the same manner, make "items" and "trieType" mutually exclusive.

Such one can query:

[some storage keys],
main storage
child storage.

Then also I would be happy to make "type" optional and let the node display the key i.e, "hash" or "value"
I haven't looked at the implementation but just my comments of the overall API.

jsdw · 2024-09-11T10:58:01Z

@lexnv just bumping this incase it fell off the radar! It'd be good to address the comments and get it through review so we can implement it, but I know you're prob busy so not urgent :)

Signed-off-by: Alexandru Vasile <[email protected]>

src/api/archive_unstable_storageDiff.md

jsdw

Looks good to me! Just one sentence I spotted that needed a wee update after other changes :)

Signed-off-by: Alexandru Vasile <[email protected]>

lexnv · 2024-09-20T09:28:13Z

@tomaka Would love to get your thoughts on this, thanks 🙏

jsdw · 2024-09-20T10:03:46Z

So this looks good, but I'm still hesitant about not using a subscription here.

If a client asks for a storage diff that is particularly large (ie between two quite distant blocks) I imagine it could be expensive to compute (in terms of the size of response needed as well as time taken to build it). Would this open the node to DoS attacks (clients asking for storage diffs known to be huge)? Or, to avoid this, would some storage diffs simply lead to an error because they are too expensive to compute and so the node refuses?

With a subscription based approach we could do something like progressively streaming the diffs we see as we work from first block to last block. The server wouldn't have to accumulate so much in one go that way, and could also take advantage of backpressure to kill connections to clients that were too slow to receive data back or whatever.

@tomaka I think you preferred a non subscription based approach so perhaps you have an idea already on how to avoid these issues (or maybe don't think that there is any issue in the first place) without needing subscriptions?

lexnv · 2024-09-30T09:51:18Z

Merging this to focus a bit on the implementation, we'll re-evaluate the subscription-based approach after we extract some performance metrics from the new RPC methods, thanks everyone for the feedback and review 🙏

Please feel free to open any issues / PRs to further discuss this, thanks again!

lexnv added 3 commits August 1, 2024 13:02

archive: Introduce archive_storageDiff

5922d71

Signed-off-by: Alexandru Vasile <[email protected]>

archive: Stop the store diff subscription

6c8e970

Signed-off-by: Alexandru Vasile <[email protected]>

archive: Add ability to continue storage diff

7e20f5f

Signed-off-by: Alexandru Vasile <[email protected]>

lexnv self-assigned this Aug 1, 2024

jsdw reviewed Aug 5, 2024

View reviewed changes

src/api/archive_unstable_storageDiff.md Outdated Show resolved Hide resolved

jsdw reviewed Aug 5, 2024

View reviewed changes

src/api/archive_unstable_storageDiff.md Outdated Show resolved Hide resolved

jsdw reviewed Aug 5, 2024

View reviewed changes

src/api/archive_unstable_storageDiff.md Outdated Show resolved Hide resolved

jsdw reviewed Aug 5, 2024

View reviewed changes

src/api/archive_unstable_storageDiff.md Outdated Show resolved Hide resolved

jsdw reviewed Aug 5, 2024

View reviewed changes

src/api/archive_unstable_storageDiff.md Outdated Show resolved Hide resolved

jsdw reviewed Aug 5, 2024

View reviewed changes

src/api/archive_unstable_storageDiff.md Outdated Show resolved Hide resolved

jsdw reviewed Aug 5, 2024

View reviewed changes

src/api/archive_unstable_storageDiffStop.md Outdated Show resolved Hide resolved

jsdw reviewed Aug 5, 2024

View reviewed changes

src/api/archive_unstable_storageDiffContinue.md Outdated Show resolved Hide resolved

tomaka reviewed Aug 6, 2024

View reviewed changes

src/api/archive_unstable_storageDiff.md Show resolved Hide resolved

lexnv added 9 commits August 6, 2024 16:43

archive: Rename currentHash->hash and fromHash->previousHash

70515bc

Signed-off-by: Alexandru Vasile <[email protected]>

archive: Remove leftover comment

9fe2b32

Signed-off-by: Alexandru Vasile <[email protected]>

archive: Remove excludeKeyPrefixes param

aeb19a0

Signed-off-by: Alexandru Vasile <[email protected]>

archive: Clarify hash parameter wrt storage diff calculation

ee2a8c1

Signed-off-by: Alexandru Vasile <[email protected]>

archive: Make previousHash ancestor of hash

7aeb735

Signed-off-by: Alexandru Vasile <[email protected]>

archive: Make subscription into a method

82d4245

Signed-off-by: Alexandru Vasile <[email protected]>

archive: Support multiple childTrie queries

c7921f6

Signed-off-by: Alexandru Vasile <[email protected]>

archive: Add a bit more details

f059c18

Signed-off-by: Alexandru Vasile <[email protected]>

archive: Extend error section

e40142b

Signed-off-by: Alexandru Vasile <[email protected]>

lexnv requested review from jsdw and tomaka August 20, 2024 15:41

jsdw reviewed Aug 28, 2024

View reviewed changes

niklasad1 reviewed Aug 29, 2024

View reviewed changes

src/api/archive_unstable_storageDiff.md Outdated Show resolved Hide resolved

niklasad1 reviewed Aug 29, 2024

View reviewed changes

src/api/archive_unstable_storageDiff.md Outdated Show resolved Hide resolved

niklasad1 reviewed Aug 29, 2024

View reviewed changes

src/api/archive_unstable_storageDiff.md Outdated Show resolved Hide resolved

niklasad1 reviewed Aug 29, 2024

View reviewed changes

src/api/archive_unstable_storageDiff.md Outdated Show resolved Hide resolved

niklasad1 reviewed Aug 29, 2024

View reviewed changes

lexnv added 5 commits September 11, 2024 15:59

archive: Remove prefixes array

e1d08ad

Signed-off-by: Alexandru Vasile <[email protected]>

archive: Adjust return object fields

821093d

Signed-off-by: Alexandru Vasile <[email protected]>

archive: Remove concept of subscription

1f8854b

Signed-off-by: Alexandru Vasile <[email protected]>

archive: Remove none return Type

d87aa84

Signed-off-by: Alexandru Vasile <[email protected]>

archive: Remove null from childTrieKey

9390a22

Signed-off-by: Alexandru Vasile <[email protected]>

jsdw reviewed Sep 17, 2024

View reviewed changes

src/api/archive_unstable_storageDiff.md Outdated Show resolved Hide resolved

jsdw previously approved these changes Sep 17, 2024

View reviewed changes

archive: Remove trieType type from example

1ed910f

Signed-off-by: Alexandru Vasile <[email protected]>

lexnv dismissed jsdw’s stale review via 1ed910f September 17, 2024 09:55

jsdw approved these changes Sep 19, 2024

View reviewed changes

niklasad1 approved these changes Sep 19, 2024

View reviewed changes

lexnv merged commit 9491d90 into main Sep 30, 2024
3 checks passed

lexnv deleted the lexnv/archive_diff branch September 30, 2024 09:51

lexnv mentioned this pull request Oct 9, 2024

rpc-v2: Implement archive_unstable_storageDiff paritytech/polkadot-sdk#5997

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

archive: Introduce archive_storageDiff for indexers #159

archive: Introduce archive_storageDiff for indexers #159

lexnv commented Aug 1, 2024

jsdw commented Aug 5, 2024

tomaka Aug 6, 2024 •

edited

Loading

lexnv Aug 6, 2024

tomaka Aug 6, 2024 •

edited

Loading

lexnv Aug 6, 2024

tomaka Aug 6, 2024

jsdw Aug 6, 2024

tomaka Aug 6, 2024

lexnv Aug 6, 2024

tomaka Aug 6, 2024

lexnv Aug 6, 2024

jsdw Aug 28, 2024 •

edited

Loading

lexnv Sep 11, 2024

jsdw Aug 28, 2024 •

edited

Loading

jsdw Aug 28, 2024 •

edited

Loading

lexnv Sep 11, 2024

jsdw Aug 28, 2024

lexnv Sep 11, 2024

niklasad1 left a comment •

edited

Loading

jsdw commented Sep 11, 2024

jsdw left a comment •

edited

Loading

lexnv commented Sep 20, 2024

jsdw commented Sep 20, 2024

lexnv commented Sep 30, 2024


		Parameters:

		- `currentHash`: String containing a hexadecimal-encoded hash of the header of the block whose storage difference will be retrieved. The storage difference is calculated between the `currentHash` block and the parent of the `currentHash` block.


		The `differences` field is an array of objects describing storage differences. Each element contains the following fields:

		- `key`: String containing the hexadecimal-encoded key of the storage entry. If the key prefix was provided in the `items` parameter, the `key` field is guaranteed to start with one of the key prefixes provided.

archive: Introduce archive_storageDiff for indexers #159

archive: Introduce archive_storageDiff for indexers #159

Conversation

lexnv commented Aug 1, 2024

jsdw commented Aug 5, 2024

tomaka Aug 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomaka Aug 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsdw Aug 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsdw Aug 28, 2024 • edited Loading

Choose a reason for hiding this comment

jsdw Aug 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

niklasad1 left a comment • edited Loading

Choose a reason for hiding this comment

jsdw commented Sep 11, 2024

jsdw left a comment • edited Loading

Choose a reason for hiding this comment

lexnv commented Sep 20, 2024

jsdw commented Sep 20, 2024

lexnv commented Sep 30, 2024

tomaka Aug 6, 2024 •

edited

Loading

tomaka Aug 6, 2024 •

edited

Loading

jsdw Aug 28, 2024 •

edited

Loading

jsdw Aug 28, 2024 •

edited

Loading

jsdw Aug 28, 2024 •

edited

Loading

niklasad1 left a comment •

edited

Loading

jsdw left a comment •

edited

Loading