Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vertex store size limit; cleanup vertex store events #854

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

LukasGasior1
Copy link
Contributor

This PR includes:

  1. The primary feature
  • added a size limit to the vertex store. The limit applies to the size of a dson-serialized vertex store state. The limit is "soft", in a sense that in some circumstances it might be slightly exceeded (e.g. inserting a newer QC/TC that has more signatures than the previous one) - for practicality reasons. The main goal is to protect against SBOR overflow (e.g. when encoding a commit request containing the vertex store) and in practice there should always be a lot of room between this limit and the critical SBOR limit - enough room not to worry about such small values. The limits defaults to 150 Mb and can be configured with bft.vertex_store.max_serialized_size_bytes (min value is 10 Mb).
  1. Additional related changes
  • vertex store serialization has been moved from state computer to the vertex store itself
  • updating vertex store metrics has been moved to the vertex store itself
  • vertex store events now include the serialized state to be persisted
  1. Drive-by
  • small refactor to vertex store events: they're now records and there's no longer a separate BFTCommittedUpdate event. Since a (consensus) commit can only happen as a result of QC insertion, this has been captured in BFTHighQCUpdate, which allowed us to get rid of the weird logic in dispatchPostQcInsertionEvents in VertexStoreAdapter
  • added vertex store persist after inserting a TC (which was missing)

@LukasGasior1 LukasGasior1 force-pushed the feature/vertex-store-overflow-mitigations branch 3 times, most recently from 3ac1d06 to b8f0707 Compare February 27, 2024 17:50
@LukasGasior1 LukasGasior1 force-pushed the feature/vertex-store-overflow-mitigations branch from b8f0707 to 501a3bf Compare February 27, 2024 18:05
Copy link

github-actions bot commented Feb 27, 2024

Docker tags
docker.io/radixdlt/private-babylon-node:pr-854
docker.io/radixdlt/private-babylon-node:f659ccb3b1
docker.io/radixdlt/private-babylon-node:sha-f659ccb

Copy link
Contributor

@jakrawcz-rdx jakrawcz-rdx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vertex store size limit feature (i.e. break vertices_loop;) looks very much ok 👍 (I mean, I was able to understand it, and it seems to work [outside of corner-cases where "really small stuff is added to vertex store without checking size"]).

I can't say I understood 100% of the rest (i.e. the refactor) :(
I left some minor "technical Java" comments.
It does look more structured than before, but I definitely lack the Consensus knowledge to spot any new subtle bugs.

If you have faith in our Consensus regression tests - feel free to merge with my 🟢 .
Otherwise - I recommend waiting for more reviewers.

@@ -233,7 +233,12 @@ public record Sync(
Counter invalidEpochInitialQcSyncStates) {}

public record VertexStore(
Gauge size, Counter forks, Counter rebuilds, Counter indirectParents) {}
Gauge size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vertexCount?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good 👍
note to self: update this on the dashboard later (if it's used)

vertexStoreConfig.maxSerializedSizeBytes()
>= VertexStoreConfig.MIN_MAX_SERIALIZED_SIZE_BYTES,
"Invalid configuration: bft.vertex_store.max_serialized_size_byte must be at least "
+ VertexStoreConfig.MIN_MAX_SERIALIZED_SIZE_BYTES);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nitpick) Preconditions has a built-int ("formatting %s", ...args) syntax

@@ -350,43 +431,15 @@ private void removeVertexAndPruneInternal(HashCode vertexId, HashCode skip) {
var children = vertexChildren.remove(vertexId);
if (children != null) {
for (HashCode child : children) {
if (!child.equals(skip)) {
removeVertexAndPruneInternal(child, null);
if (!skip.map(child::equals).orElse(false)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!skip.map(child::equals).orElse(false)) {
if (!Optional.of(child).equals(skip)) {

.flatMap(
removedVertex ->
Optional.ofNullable(vertexChildren.get(removedVertex.vertex().getParentVertexId())))
.ifPresent(siblings -> siblings.remove(vertexId));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some lines could be a lot less awkward if this.vertexChildren was really a Multimap<HashCode, HashCode> (e.g. = HashMultimap.create())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splendid idea! Done.

Copy link

sonarqubecloud bot commented Mar 6, 2024

@dhedey dhedey self-assigned this Mar 22, 2024
Copy link
Contributor

@dhedey dhedey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally really nice changes. I like the event fixing, the metrics tweaks, and the use of sealed interfaces for better error propagation.

The code in VertexStoreImpl isn't the prettiest, but then again, the whole class isn't great to start with... I wonder if we should change the size check to a simple check on current (cached) serialized size before we try inserting each vertex; rather than a check of the state-after-inserting-the-vertex? This would reduce the work done on the happy path, and mean we can be happier in not checking the size on the other code paths which just update the highQcs. Anyway, just an idea.

I don't think any of these notes are blockers, but ideally some of the more important ones could do with a small fix before we merge.

* An event emitted when vertex store updates its highQC, which possibly results in some vertices
* being committed.
*/
public record BFTHighQCUpdate(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've lost the nice toString here - perhaps we should override it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise we could end up with a massive log of the serialized vertex state in hex.

}
}
/** An event emitted after a vertex has been inserted into the vertex store. */
public record BFTInsertUpdate(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've lost the nice toString here - perhaps we should override it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise we could end up with a massive log of the serialized vertex state in hex.

}
}
/** An even emitted when the vertex store has been rebuilt. */
public record BFTRebuildUpdate(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've lost the nice toString here - perhaps we should override it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise we could end up with a massive log of the serialized vertex state in hex.

}

public boolean insertTimeoutCertificate(TimeoutCertificate timeoutCertificate) {
return vertexStore.insertTimeoutCertificate(timeoutCertificate);
final var result = vertexStore.insertTimeoutCertificate(timeoutCertificate);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For code beauty, can we align the insertQc below with this?

e.g. the code below should be insertQuorumCertificate, it should match on result, and we can inline dispatchPostQcInsertionEvents

if (hasAnyChildren) {
// TODO: Check to see if qc's match in case there's a fault
return new VertexStore.InsertQcResult.Ignored();
}

// proposed vertex doesn't have any children
// Proposed vertex doesn't have any children
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<sidenote>Still really want to rename HighQC => HighCertificates to make this less confusing :D</sidenote>

highQcUpdate
.committedVertices()
.ifPresent(
committedVertices -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a minor style thing - this is nested quite deep. Perhaps we should consider either pulling out some methods, or returning early in the not-present case to de-nest things.

@ProcessOnDispatch Set<EventProcessor<BFTInsertUpdate>> processors,
Environment environment,
Metrics metrics) {
@ProcessOnDispatch Set<EventProcessor<BFTInsertUpdate>> processors, Environment environment) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice moving of metrics into the vertex store. I think this makes this simpler to comprehend.

},
BFTCommittedUpdate.class);
BFTHighQCUpdate.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code is now a duplicate so should be deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, good catch

@@ -162,7 +161,8 @@ StateComputerPrepareResult prepare(
List<RawNotarizedTransaction> proposedTransactions,
RoundDetails roundDetails);

LedgerProofBundle commit(LedgerExtension ledgerExtension, VertexStoreState vertexStore);
LedgerProofBundle commit(
LedgerExtension ledgerExtension, Option<byte[]> serializedVertexStoreState);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest, why is this Option<byte[]> and not Option<WrappedByteArray>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no reason at all, I'll change to Option<WrappedByteArray>

* and the forced divergent vertex execution in `prepare` is reverted.
*/
// spotless:on
public final class DivergentExecutionLivenessBreakTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test 👍

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 83.03571% with 38 lines in your changes missing coverage. Please review.

Project coverage is 42.8%. Comparing base (71327ad) to head (e791b13).

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             develop    #854     +/-   ##
===========================================
- Coverage       42.8%   42.8%   -0.1%     
+ Complexity      4299    4295      -4     
===========================================
  Files           1692    1692             
  Lines          51999   52035     +36     
  Branches        1494    1496      +2     
===========================================
+ Hits           22298   22312     +14     
- Misses         29231   29250     +19     
- Partials         470     473      +3     
Flag Coverage Δ
rust 42.8% <83.0%> (-0.1%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...src/main/java/com/radixdlt/monitoring/Metrics.java 0.0% <ø> (ø)
...adixdlt/consensus/epoch/EpochsConsensusModule.java 84.2% <ø> (-0.3%) ⬇️
...ava/com/radixdlt/consensus/liveness/Pacemaker.java 92.3% <100.0%> (ø)
...ixdlt/consensus/vertexstore/VertexStoreConfig.java 100.0% <100.0%> (ø)
...dixdlt/consensus/vertexstore/VertexStoreState.java 79.7% <100.0%> (-1.1%) ⬇️
...in/java/com/radixdlt/modules/DispatcherModule.java 92.9% <100.0%> (+1.0%) ⬆️
...c/main/java/com/radixdlt/modules/LedgerModule.java 100.0% <100.0%> (ø)
...in/java/com/radixdlt/modules/SystemInfoModule.java 69.2% <ø> (-2.2%) ⬇️
...main/java/com/radixdlt/rev2/REv2StateComputer.java 97.1% <100.0%> (-0.1%) ⬇️
...radixdlt/rev2/modules/MockedVertexStoreModule.java 0.0% <ø> (ø)
... and 11 more

... and 1 file with indirect coverage changes

Copy link

@LukasGasior1 LukasGasior1 force-pushed the feature/vertex-store-overflow-mitigations branch from e791b13 to 1ad56a4 Compare June 12, 2024 16:16
final var proof = ledgerExtension.proof();
final var header = proof.ledgerHeader();

var commitRequest =
new CommitRequest(
ledgerExtension.transactions(),
proof,
serializedVertexStoreState,
serializedVertexStoreState.map(WrappedByteArray::value),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just always handle it as a WrappedByteArray I think? It's just safer than passing around a big byte[].

If we need to, let's add an SborCodec for WrappedByteArray:

  public static void registerCodec(CodecMap codecMap) {
    codecMap.register(
        WrappedByteArray.class,
        codecs ->
            new CustomByteArrayCodec<>(
                WrappedByteArray::value,
                WrappedByteArray::new));
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants