Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log indexing #8082

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from
Draft

Log indexing #8082

wants to merge 6 commits into from

Conversation

Scooletz
Copy link
Contributor

@Scooletz Scooletz commented Jan 20, 2025

This PR speculatively proposes a new way of capturing logs and querying them. It does it by introducing a LogBuilder that is responsible for capturing the logs as they appear and later build them into an immutable file. To make it efficient, each address/topic is hashes using XXHash64. This is a fast non-cryptographic hash based on 64 bit ulong. This, due to the birthday paradox, can suffer from collisions after breaching 4 billion of topics (log_2 (64)).

Entries

Each LogEntry is indexed by its address and all the topics. To distinguish the topic by position, different seeds are used for hashing. This impacts the collision probability slightly. The block number and a transaction number are encoded under ulong->uint mapping. While ulong represents the hash, uint is used to encode tuple of (block, tx). This gives 12 bytes of storage for each topic/address of a log entry.

Writing and deduplication

When a builder is flushed to a IBufferWriter<byte>, first it has its entries sorted by their hashes. To make the lookup faster, key/values are kept as separate arrays. After sorting, if a key (topic/address) appears multiple times, it's encoded in a different way. First all the corresponding (block, tx) are sorted to later be encoded using diff encoding with a varint. They are written to the output buffer that is sealed by a special entry at the that points to the beginning of the sequence. The address of such entry is the thing that is mapped to the topic in a given file. This allows to encode frequently occurring topics in a very efficient way without sacrificing the case of a unique address/topic. The file is sealed by writing a single int that represents how many topics there are in it. The rest, can be derived.

Example

  1. three hashes A, B, C
  2. hash A and C are repeated:
    1. encoded using diff encoding to compress them
    2. suffix length writing allows writing the length after the payload is written
  3. hash B is not repeated and it's value is stored directly
0                      13        17                    59        63          67          71             75        83           91        99     103
┌──────────────────────┬─────────┬──────────────────────┬────────┬────────────┬───────────┬─────────────┬──────────┬────────────┬────────┬──────┐  
│                      │         │                      │        │            │           │             │          │            │        │      │  
│ HASH_A diff encoded  │    0    │ HASH_C diff encoded  │  17    │ 13| Marker │ (1, 13)   │ 59 | Marker │ HASH_A   │ HASH_B     │ HASH_C │  3   │  
│                      │    │    │                      │   │    │            │           │             │          │            │        │      │  
└──────────────────────┴────┼────┴──────────────────────┴───┼────┴────────────┴───────────┴─────────────┴──────────┴────────────┴────────┴──────┘  
▲                           │    ▲                          │                                                                                      
│                           │    │                          │                                                                                      
│                           │    │                          │                                                                                      
│                           │    │                          │                                                                                      
│                           │    │                          │                                                                                      
│                           │    │                          │                                                                                      
└───────────────────────────┘    └──────────────────────────┘                                                                                      
                                                                                                                                                   
                                                                                                                                                   
                                                                                                                                                   
│                                │                               │           │            │             │                                │      │  
│                                │                               │value for  │ an actual  │ jump for    │                                │      │  
│                                │                               │  HASH_A   │  value     │  HASH_C     │ 3 hashes ..................... │count │  
│entries encoded for HASH_A      │ entries encoded for HASH_C    │encoded as │ block = 1  │             │                                │      │  
│                                │                               │ a jump to │ tx = 13    │             │                                │      │  
│                                │                               │    13     │            │             │                                │      │  

Size considerations

  1. Entries take
    1. 12 bytes - a unique entry will use in the output file
    2. 2 bytes per entry - repeated entries that occur frequently (every other transaction for example) can be encode as efficiently as (see the test)
  2. If files are grouped every ~64k blocks,
    1. at the current moment ~360 files would be needed to capture the logs of mainnet
    2. if a given event was occurring 200 times per block, it would occur 13,107,200 of times across 64k blocks which, using the encoding above would cost ~26MB to store

Merging and querying

As a single builder can encode up to a few hundred thousands blocks (probably not possible due to the memory), merging of files must be implemented. As they are ordered both by keys as well as entries, it can be thought of as merging two sorted enumerables (sorting).

Querying is not implemented yet. Binary search over billions of keys is impossible, so additional index is required. We could introduce simple skip-lists at the top, or split keys by prefixes. If additional index is introduced, it still can be written to the output buffer in the single pass. Queries that require AND with this design would issue separate searches that would result in enumerables of (block, tx) that would be intersected. Keeping the files small, within a limited range of blocks can limit the length of the enumerables and improve the speed.

GC

If the builder builds files in block ranges, like 64k blocks at a time, files could be named using the starting block number. Then GC would remove the oldest files.

Issues

The diff encoding may be heavy to search through. If needed, when grouping 64k blocks is not enough, different encoding or a skip list can be added.

Changes

  • List the changes

Types of changes

What types of changes does your code introduce?

  • Bugfix (a non-breaking change that fixes an issue)
  • New feature (a non-breaking change that adds functionality)
  • Breaking change (a change that causes existing functionality not to work as expected)
  • Optimization
  • Refactoring
  • Documentation update
  • Build-related changes
  • Other: Description

Testing

Requires testing

  • Yes
  • No

If yes, did you write tests?

  • Yes
  • No

Notes on testing

Optional. Remove if not applicable.

Documentation

Requires documentation update

  • Yes
  • No

If yes, link the PR to the docs update or the issue with the details labeled docs. Remove if not applicable.

Requires explanation in Release Notes

  • Yes
  • No

If yes, fill in the details here. Remove if not applicable.

Remarks

Optional. Remove if not applicable.

@Scooletz
Copy link
Contributor Author

Scooletz commented Jan 21, 2025

Just to compare:

  • 1 billion of entries (250 of addresses and 750 of topics) takes 12GB now with the other approach but can be greatly reduced
  • with this maximum size would be 12GBs as well, but it strongly depends on the distribution. The bottom line would be 2GB ~1GB (with same topic repeated over and over in each tx) but this is unreasonable and should be thought of only as the lowest boundary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant