Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structure of compute node memory requirements #82

Open
Zarquan opened this issue Oct 16, 2024 · 0 comments
Open

Structure of compute node memory requirements #82

Zarquan opened this issue Oct 16, 2024 · 0 comments

Comments

@Zarquan
Copy link
Member

Zarquan commented Oct 16, 2024

Earlier iteration used min/max numeric values for memory size, with a separate string for the units.

memory:
  min: 21
  max: 24
  units "GiB"

The client set the minimum value, and the server set the maximum value. That worked, but it kind of mangled the meaning of min and max.
The current design represents memory size as string values for requested and offered, with units in the string.

memory:
  requested: "21GiB"
  offered:   "24GiB"

That's nice for human to write, but crappy for a machine to read. Having seen this in action, it is probably better to split this back into a numeric value and a separate string for the units.

The meaning of requested and offered is clearer than the original min and max, but again, having seen this in action, I'm questioning why we need the separate values.

So it might be better to simplify it to a single value.

memory:
  size: 21
  units "GiB"

However, while working on the allocation algorithm we have found a use case that uses the max and min fields
as actual maximum and minimum values.

If the user requests multiple compute resources, the resource allocation algorithm doesn't do a separate evaluation for each compute resource. The service combines the memory and cpu core requirements for all of the compute resources into total values for the whole set. It then runs the allocation algorithm based on those totals, looking for blocks that have at least that amount of resources available. If it finds a block that has more resources available, it will create an offer distributing them across the compute nodes based on the ratio of how much they requested.

For example, if the user asks for two compute nodes, a head-node with 4GiB of memory and a compute-node with 16GiB of memory.

resources:
  compute:
    - name: head-node
      memory: 4GiB 
    - name: compute-node
      memory: 16GiB 

The allocation algorithm will add these together and look for a block of resources that has at least 20GiB of memory.

If the search algorithm finds a block with 64GiB of memory, it will distributed this across the nodes based on a ratio of the requested sizes. Allocating 4 * (64 / 20) = 12GiB to the head-node and 16 * (64 / 20) = 51GiB to the compute-node.

resources:
  compute:
    - name: head-node
      memory: 12GiB
    - name: compute-node
      memory: 51GiB

There is a use case where the head-node is just running a simple UI and the user wants as much as possible of the memory to be allocated to the compute-node. To meet this use case we can add the min and max values discussed earlier, using the max value to limit the amount of memory to allocate to the head-node and leaving the compute-node unlimited.

resources:
  compute:
    - name: head-node
      memory:
        min: 4GiB
        max: 4GiB # limit set by client
    - name: compute-node
      memory:
        min: 16GiB

The service would allocate the extra memory up to the limit on the head-node, and then allocate the rest to the compute-node.

resources:
  compute:
    - name: head-node
      memory:
        min: 4GiB
        max: 4GiB # limit set by the client
    - name: compute-node
      memory:
        min: 16GiB
        max: 51GiB # extra allocated by the service 

The min and max values make sense in the request, but do they make sense in the response?

We be very specific in the text description about what they mean in the context of a request and response, but that approach is open to mis-interpretation.

Also, because this structure doesn't distinguish between the max value on the head-node set by the client and the max value on the compute-node set by the service we can't use a response from one service as a request to another service without a complicated edit.

If we send this response as-is to another service that has 100GiB of memory available, the max: 51GiB on the compute-node will prevent the new service from allocating more memory to it.

Separating out the offered memory size as a separate value in the response solves this.

resources:
  compute:
    - name: head-node
      memory:
        min: 4GiB
        max: 4GiB # limit set by the client
        size: 4GiB
    - name: compute-node
      memory:
        min: 16GiB
        size: 51GiB

That makes a clearer separation between what the client is requesting and what the service is offering
and we can filter the response from one service to remove the size value and send it as a request to
another service

resources:
  compute:
    - name: head-node
      memory:
        min: 4
        max: 4
        units: GiB
    - name: compute-node
      memory:
        min: 16
        units: GiB

Which responds with an offer of 100GiB for the compute-node.

resources:
  compute:
    - name: head-node
      memory:
        min: 4
        max: 4
        size: 4
        units: GiB
    - name: compute-node
      memory:
        min: 16
        size: 100
        units: GiB

However, this structure is very specific to this case and looses the option to add more fields to the request or offer. sections.

If we keep the explicit requested and offered sections, then we leave space to add more fields later on.
For example, some computing algorithms were particularly sensitive to memory speed, then in the future we could add a ddr field to the request and response sections.

resources:
  compute:
    - name: head-node
      memory:
        requested:
          min: 4
          max: 4
          units: GiB
        offered:
          size: 4
          units: GiB
          ddr: DDR4-4400
    - name: compute-node
      memory:
        requested:
          min: 15
          units: GiB
          ddr: DDR4-4000
        offered:
          size: 51
          units: GiB
          ddr: DDR4-4400

In this example it is clear that the client is specifying DDR4-4000 memory in the comp-node and the server is responding with an offer of DDR4-4400 for both the head-node and comp-node.
This case is way out of scope for the current iterations, but it shows how keeping the request and offer sections separate provide a level of future proofing to the structure.

A more appropriate example would be read/write bandwidth and latency for disc based storage. The same requested and offered pattern is likely to be very useful for describing the client's requirements and the service offers for storage resources.


In summary, the target to aim for is :

resources:
  compute:
    - name: head-node
      memory:
        requested:
          min: 4
          max: 4
          units: GiB
        offered:
          size: 4
          units: GiB
    - name: compute-node
      memory:
        requested:
          min: 15
          units: GiB
        offered:
          size: 51
          units: GiB

We aren't quite there yet, but we can evolve towards that gradually. In particular, the requirement for the service to remember and provide the requested fields in its response is only SHOULD requirement at this stage.

Remembering what the client asked for requires extra code that our current prototype doesn't have. Including the extra code to support this is on the 'nice to have' list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant