Log indexing #8082

Scooletz · 2025-01-20T20:14:49Z

This PR speculatively proposes a new way of capturing logs and querying them. It does it by introducing a LogBuilder that is responsible for capturing the logs as they appear and later build them into an immutable file. To make it efficient, each address/topic is hashes using XXHash64. This is a fast non-cryptographic hash based on 64 bit ulong. This, due to the birthday paradox, can suffer from collisions after breaching 4 billion of topics (log_2 (64)).

Entries

Each LogEntry is indexed by its address and all the topics. To distinguish the topic by position, different seeds are used for hashing. This impacts the collision probability slightly. The block number and a transaction number are encoded under ulong->uint mapping. While ulong represents the hash, uint is used to encode tuple of (block, tx). This gives 12 bytes of storage for each topic/address of a log entry.

Writing and deduplication

When a builder is flushed to a IBufferWriter<byte>, first it has its entries sorted by their hashes. To make the lookup faster, key/values are kept as separate arrays. After sorting, if a key (topic/address) appears multiple times, it's encoded in a different way. First all the corresponding (block, tx) are sorted to later be encoded using diff encoding with a varint. They are written to the output buffer that is sealed by a special entry at the that points to the beginning of the sequence. The address of such entry is the thing that is mapped to the topic in a given file. This allows to encode frequently occurring topics in a very efficient way without sacrificing the case of a unique address/topic. The file is sealed by writing a single int that represents how many topics there are in it. The rest, can be derived.

Example

three hashes A, B, C
hash A and C are repeated:
1. encoded using diff encoding to compress them
2. suffix length writing allows writing the length after the payload is written
hash B is not repeated and it's value is stored directly

0                      13        17                    59        63          67          71             75        83           91        99     103
┌──────────────────────┬─────────┬──────────────────────┬────────┬────────────┬───────────┬─────────────┬──────────┬────────────┬────────┬──────┐  
│                      │         │                      │        │            │           │             │          │            │        │      │  
│ HASH_A diff encoded  │    0    │ HASH_C diff encoded  │  17    │ 13| Marker │ (1, 13)   │ 59 | Marker │ HASH_A   │ HASH_B     │ HASH_C │  3   │  
│                      │    │    │                      │   │    │            │           │             │          │            │        │      │  
└──────────────────────┴────┼────┴──────────────────────┴───┼────┴────────────┴───────────┴─────────────┴──────────┴────────────┴────────┴──────┘  
▲                           │    ▲                          │                                                                                      
│                           │    │                          │                                                                                      
│                           │    │                          │                                                                                      
│                           │    │                          │                                                                                      
│                           │    │                          │                                                                                      
│                           │    │                          │                                                                                      
└───────────────────────────┘    └──────────────────────────┘                                                                                      
                                                                                                                                                   
                                                                                                                                                   
                                                                                                                                                   
│                                │                               │           │            │             │                                │      │  
│                                │                               │value for  │ an actual  │ jump for    │                                │      │  
│                                │                               │  HASH_A   │  value     │  HASH_C     │ 3 hashes ..................... │count │  
│entries encoded for HASH_A      │ entries encoded for HASH_C    │encoded as │ block = 1  │             │                                │      │  
│                                │                               │ a jump to │ tx = 13    │             │                                │      │  
│                                │                               │    13     │            │             │                                │      │

Size considerations

Entries take
1. 12 bytes - a unique entry will use in the output file
2. 2 bytes per entry - repeated entries that occur frequently (every other transaction for example) can be encode as efficiently as (see the test)
If files are grouped every ~64k blocks,
1. at the current moment ~360 files would be needed to capture the logs of mainnet
2. if a given event was occurring 200 times per block, it would occur 13,107,200 of times across 64k blocks which, using the encoding above would cost ~26MB to store

Merging and querying

As a single builder can encode up to a few hundred thousands blocks (probably not possible due to the memory), merging of files must be implemented. As they are ordered both by keys as well as entries, it can be thought of as merging two sorted enumerables (sorting).

Querying is not implemented yet. Binary search over billions of keys is impossible, so additional index is required. We could introduce simple skip-lists at the top, or split keys by prefixes. If additional index is introduced, it still can be written to the output buffer in the single pass. Queries that require AND with this design would issue separate searches that would result in enumerables of (block, tx) that would be intersected. Keeping the files small, within a limited range of blocks can limit the length of the enumerables and improve the speed.

GC

If the builder builds files in block ranges, like 64k blocks at a time, files could be named using the starting block number. Then GC would remove the oldest files.

Issues

The diff encoding may be heavy to search through. If needed, when grouping 64k blocks is not enough, different encoding or a skip list can be added.

Changes

List the changes

Types of changes

What types of changes does your code introduce?

Bugfix (a non-breaking change that fixes an issue)
New feature (a non-breaking change that adds functionality)
Breaking change (a change that causes existing functionality not to work as expected)
Optimization
Refactoring
Documentation update
Build-related changes
Other: Description

Testing

Requires testing

Yes
No

If yes, did you write tests?

Yes
No

Notes on testing

Optional. Remove if not applicable.

Documentation

Requires documentation update

Yes
No

If yes, link the PR to the docs update or the issue with the details labeled docs. Remove if not applicable.

Requires explanation in Release Notes

Yes
No

If yes, fill in the details here. Remove if not applicable.

Remarks

Optional. Remove if not applicable.

Scooletz · 2025-01-21T16:52:20Z

Just to compare:

1 billion of entries (250 of addresses and 750 of topics) takes 12GB now with the other approach but can be greatly reduced
with this maximum size would be 12GBs as well, but it strongly depends on the distribution. The bottom line would be ~~2GB~~ ~1GB (with same topic repeated over and over in each tx) but this is unreasonable and should be thought of only as the lowest boundary.

Scooletz added 6 commits January 20, 2025 20:53

log builder

d9d7f5f

format

997370d

deduplicated logs for the same block/tx

1f9e7b0

asserting more

4ae8e8a

Proper spanification of the writing process

cb5fc0c

format

f1df98e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log indexing #8082

Log indexing #8082

Scooletz commented Jan 20, 2025 •

edited

Loading

Scooletz commented Jan 21, 2025 •

edited

Loading

Log indexing #8082

Are you sure you want to change the base?

Log indexing #8082

Conversation

Scooletz commented Jan 20, 2025 • edited Loading

Entries

Writing and deduplication

Example

Size considerations

Merging and querying

GC

Issues

Changes

Types of changes

What types of changes does your code introduce?

Testing

Requires testing

If yes, did you write tests?

Notes on testing

Documentation

Requires documentation update

Requires explanation in Release Notes

Remarks

Scooletz commented Jan 21, 2025 • edited Loading

Scooletz commented Jan 20, 2025 •

edited

Loading

Scooletz commented Jan 21, 2025 •

edited

Loading