Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: dynamic span, to calculate span limit dynamically #9495

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

0xFFFC0000
Copy link
Collaborator

No description provided.

@@ -62,7 +62,8 @@
#define MLOG_PEER_STATE(x) \
MCINFO(MONERO_DEFAULT_LOG_CATEGORY, context << "[" << epee::string_tools::to_string_hex(context.m_pruning_seed) << "] state: " << x << " in state " << cryptonote::get_protocol_state_string(context.m_state))

#define BLOCK_QUEUE_NSPANS_THRESHOLD 10 // chunks of N blocks
#define BLOCK_QUEUE_NSPANS_THRESHOLD 200 // chunks of N blocks
#define BLOCK_QUEUE_NSPANS_MINIMUM 10 // minimum number of spans //
Copy link
Contributor

@sneurlax sneurlax Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dangling // is the only (non-)issue I see in this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for mentioning it. Fixed.

Copy link
Contributor

@sneurlax sneurlax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, straightforward changes.

@@ -2104,7 +2121,17 @@ skip:
const uint32_t peer_stripe = tools::get_pruning_stripe(context.m_pruning_seed);
const uint32_t local_stripe = tools::get_pruning_stripe(m_core.get_blockchain_pruning_seed());
const size_t block_queue_size_threshold = m_block_download_max_size ? m_block_download_max_size : BLOCK_QUEUE_SIZE_THRESHOLD;
bool queue_proceed = nspans < BLOCK_QUEUE_NSPANS_THRESHOLD || size < block_queue_size_threshold;
m_span_limit = m_span_limit ? m_span_limit : BLOCK_QUEUE_NSPANS_THRESHOLD;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this default to the large THRESHOLD value instead of the current MINIMUM value?

Also what's the thread-safety strategy for m_span_limit? A lock is held when it's set, but none is held during this read here. I think this will work if m_span_limit is an atomic, and cached here for the two reads.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this default to the large THRESHOLD value instead of the current MINIMUM value?

We want to utilize the hardware as much as we can.

Also what's the thread-safety strategy for m_span_limit? A lock is held when it's set, but none is held during this read here. I think this will work if m_span_limit is an atomic, and cached here for the two reads.

Thanks for mentioning it. I missed it. I think changing it to atomic is the best approach.

void t_cryptonote_protocol_handler<t_core>::calculate_dynamic_span(double blocks_per_seconds)
{
MINFO("m_bss : " << m_bss << ", blocks_per_seconds : " << blocks_per_seconds << ", m_span_limit : " << m_span_limit);
m_span_limit = (m_bss && blocks_per_seconds) ? (( blocks_per_seconds * 60 * m_span_time ) / m_bss) : BLOCK_QUEUE_NSPANS_THRESHOLD;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the span limit increase if the throughput to disk is higher? Shouldn't the limit be related to how much memory is available (because ultimately an increase in m_span_limit will increase the memory requirements)? Or is disk throughput used as an approximation for memory?

Again, why does this default to THRESHOLD instead of MINIMUM?

Copy link
Collaborator Author

@0xFFFC0000 0xFFFC0000 Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is disk throughput used as an approximation for memory?

Correct.

Again, why does this default to THRESHOLD instead of MINIMUM?

We want to utilize the hardware as much as we can.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to utilize the hardware as much as we can.

What about the machines running on minimum spec hardware?

Copy link

@nahuhh nahuhh Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Their sync time (seconds per block) would be slow, which would accordingly reduce their queue limit

edit: i think this could be switched for the min..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In new push, I changed it to BLOCK_QUEUE_NSPANS_MINIMUM.

@@ -1658,10 +1673,12 @@ namespace cryptonote
m_block_queue.remove_spans(span_connection_id, start_height);

const uint64_t current_blockchain_height = m_core.get_current_blockchain_height();
const boost::posix_time::time_duration dt = boost::posix_time::microsec_clock::universal_time() - start;
double blocks_per_seconds = (((current_blockchain_height - previous_height) * 1e6) / dt.total_microseconds());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocks_per_second can also be const.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Will fix this in refreshed PR.

@@ -124,6 +124,11 @@ namespace cryptonote
, "Set maximum size of block download queue in bytes (0 for default)"
, 0
};
const command_line::arg_descriptor<size_t> arg_span_limit = {
"span-limit"
, "Set span limit when syncing, can time (m postfix for minutes), default is 2 minutes"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description looks wrong. And the name needs changing too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Span signifies the number of batches we are downloading at the moment. So the span limit is a maximum number to limit number of downloading batches right now.

Copy link
Contributor

@vtnerd vtnerd Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description says ", can time (m postfix for minutes)," for some reason - this just doesn't read correctly to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant to say that the unit is always in minutes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are correct. It was a wrong way to phrase it from my side.

m_span_limit = (m_bss && blocks_per_seconds) ? (( blocks_per_seconds * 60 * m_span_time ) / m_bss) : BLOCK_QUEUE_NSPANS_THRESHOLD;
if (m_span_limit < BLOCK_QUEUE_NSPANS_MINIMUM)
m_span_limit = BLOCK_QUEUE_NSPANS_MINIMUM;
MINFO("calculated dynamic span limit is span_limit : " << m_span_limit);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be some cap on m_span_limit?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its default "what can be synced within 2 minutes".

do you mean, a cap on if a user decides to set it to, say, 60 minutes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how m_span_limit is what can be synced within 2 minutes - the time portion tracks just the DB sync time. And it's divided by this bss value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the idea is fill the DB time as much as possible, which should work, maybe I'm being too harsh here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And to circle back around to the original issue - I thought there should be a cap in case the DB sync time was crazy fast.

Copy link

@nahuhh nahuhh Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bss = block sync size (how many blocks in span)

The way its intended to work:

db sync time returns how many blocks are being synced per second
using this calculation, we take the block sync size (lets say 20blocks per span) do math to calculate how many spans to queue to stay 2mins (default span-limit) ahead in queue.

Too low of a value results in "outpacing" the queue (you sync faster than you download, then have to wait for downloads to catch up).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timings track just the DB sync time, and it could result in rather large memory usage. The big counter-argument is that the user can put 0 here and use the current value.

@@ -2222,7 +2249,7 @@ skip:
NOTIFY_REQUEST_GET_OBJECTS::request req;
bool is_next = false;
size_t count = 0;
const size_t count_limit = m_core.get_block_sync_size(m_core.get_current_blockchain_height());
m_bss = m_core.get_block_sync_size(m_core.get_current_blockchain_height());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also make this is an atomic, I think it works fine if so (and then you'll have to cache the atomic read in the other function too otherwise it could change).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestions. Thanks. Will fix this in updated PR.

@0xFFFC0000
Copy link
Collaborator Author

Updated to most latest master branch commit : 9866a0e9021e2422d8055731a586083eb5e2be67.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants