All benchmark-worker heap size to flex with the size of the machine #78

tmgstevens · 2023-11-22T11:22:11Z

No description provided.

StephanDollberg

Not a JVM tuning expert but those sound sensible.

Only one I would be concerned about would be GC threads one which are being set to 32. Would this be a problem on smaller machines?

tmgstevens · 2023-11-22T11:30:49Z

I think I took that from upstream. We could probably take that one out and let the JVM choose the number of threads, what do you think?

StephanDollberg · 2023-11-22T11:40:58Z

I think I took that from upstream. We could probably take that one out and let the JVM choose the number of threads, what do you think?

Yeah lets maybe take those out.

travisdowns · 2024-02-19T13:41:55Z

@tmgstevens coming back to this one: I don't see this long list of tuning options upstream, over there it is just a quite simple -Xms4G -Xmx4G -XX:+UseG1GC.

Maybe this is tuning for a big box?

I think host-size specific tuning might be better off in the deployment & launch scripts for specific scenarios, as we have in the Redpanda folder, since things like very large heap sizes (as we had hardcoded in the past) and thread counts are sort of tied to specific benchmark deployments with machines of a certain size? The idea is that you override the HEAP_OPTS when you launch the worker.

I do like the the idea of the letting the heap "flex" with the machine size, because we also currently run into problems with 8G heap on small machines were we get an OOME.

What if we just put in the "flex" part (which is swapping out Xms/Xms for MaxRAMPercentage and friends) for now?

travisdowns · 2024-02-28T03:24:31Z

Ping @tmgstevens .

tmgstevens · 2024-03-25T14:11:47Z

@travisdowns it comes from here: https://github.com/openmessaging/benchmark/blob/master/driver-kafka/deploy/ssd-deployment/deploy.yaml#L229

The other salient reference being https://github.com/confluentinc/benchmark/blob/10x-performance-blog/driver-kafka/deploy/confluent-deployment/deploy.yaml#L335

But now I also see https://github.com/redpanda-data/openmessaging-benchmark/blob/main/driver-redpanda/deploy/deploy.yaml#L389 as well.

So the purpose of this change was specifically for when Ansible isn't being used, which would be the Helm chart install. Otherwise, the stuff in the Ansible is going to take precedence, although tbh we should change that so it's not hard coded to 50GB.

travisdowns · 2024-03-25T15:06:10Z

Thanks @tmgstevens . I didn't know the ansible actually just modiied the start script in-situ with a regex. Among other things, this means we'd better be careful that this JVM_MEM assignment remains on a single line (i.e., if it used \ + newline continuation this would break) and that the internal logic remains compatible with overriding that variable.

Anyway, back to the change:

Here's a concrete suggestion: what if we just make these variables properly overridable by env vars (same names) passed in by the caller? Then the helm chart (and, arguably ansible) can just set those env vars to whatever they want depending on the scenario they are setting up. Because it doesn't seem like 1 default to rule them all is going to work anyway since we presumably have different scenarios we might set up for.

All benchmark-worker heap size to flex with the size of the machine

2fb3aec

tmgstevens requested a review from a team as a code owner November 22, 2023 11:22

StephanDollberg approved these changes Nov 22, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All benchmark-worker heap size to flex with the size of the machine #78

All benchmark-worker heap size to flex with the size of the machine #78

tmgstevens commented Nov 22, 2023

StephanDollberg left a comment

tmgstevens commented Nov 22, 2023

StephanDollberg commented Nov 22, 2023

travisdowns commented Feb 19, 2024

travisdowns commented Feb 28, 2024

tmgstevens commented Mar 25, 2024

travisdowns commented Mar 25, 2024

All benchmark-worker heap size to flex with the size of the machine #78

Are you sure you want to change the base?

All benchmark-worker heap size to flex with the size of the machine #78

Conversation

tmgstevens commented Nov 22, 2023

StephanDollberg left a comment

Choose a reason for hiding this comment

tmgstevens commented Nov 22, 2023

StephanDollberg commented Nov 22, 2023

travisdowns commented Feb 19, 2024

travisdowns commented Feb 28, 2024

tmgstevens commented Mar 25, 2024

travisdowns commented Mar 25, 2024