Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need mechanism to avoid blackholing when DDM path between Transit Router and Server Router is lost #368

Open
taspelund opened this issue Sep 16, 2024 · 1 comment
Labels
bgp Border Gateway Protocol Bug ddm Delay Driven Multipath Idea New ideas to consider. want

Comments

@taspelund
Copy link
Contributor

taspelund commented Sep 16, 2024

Server Routers act solely as stub routers and cannot be used as a transit node for traffic originating from another DDM router.
In the current Oxide topology (2 sidecars/Transit Routers + N gimlets/Server Routers), a single backplane link failure can result in blackholing of traffic.
i.e.
If the link between Sled 0 and Switch 0 goes down, traffic destined for Sled 0 that arrives at Switch 0 will be lost.

 ┌──────────┐   ┌──────────┐
 │          │   │          │
 │ Switch 0 │   │ Switch 1 │
 └─┬─────┬──┘   └──────┬──┬┘
   │     │             │  │ 
   │     └──────────┐  │  │ 
   │                │  │  │ 
   x     ┌──────────┼──┘  │ 
   │     │          │     │ 
  ┌┴─────┴─┐      ┌─┴─────┴┐
  │ Sled 0 │      │ Sled 1 │
  └────────┘      └────────┘

This happens as a result of multiple factors coinciding.

  1. The physical topology inside the Oxide rack is a 3-stage clos uses the "spine" layer as the exit of the fabric. This means there are no alternative paths to get from an exit to a leaf node that don't involve crossing an additional number of links (e.g. spine0 -> leaf0 = 1 link, vs spine0 -> leaf1 -> spine1 -> leaf0 = 3 links).
  2. DDM does not allow Server Routers to be used for transit, which preserves the "valley free" property of the network by disallowing the 3-link routing path mentioned in the above bullet point. (We also likely wouldn't want to use Server Routers for transit because of the implications it would have on gimlet CPU load, network bandwidth, etc.)
  3. The exit nodes are effectively doing Northbound aggregation of individual External IPs via BGP advertisements. The External IPs are assigned to instances via omicron, which can/will be dispersed across the Overlay, effectively partitioning the External IP subnets used in the rack. Without exposing the dis-aggregation of these External IPs (advertising /32 or /128 routes for each External IP in use) Northbound via BGP, there is no way for the Northbound network to see or react to this failure.

We need a mechanism to properly handle this failure case.

@taspelund
Copy link
Contributor Author

Some ideas on mechanisms:

  1. Add BGP support for dynamically learning and exporting host routes covering active External IPs (leaking dis-aggregated info northbound)
  2. Enable the use of DDM over front-panel ports. Today this could be used to connect Switch0 and Switch1 in the same rack (changing topology to add a less-preferred path used only in failure conditions), but could be a generalized mechanism eventually reused for multi-rack.

For (1) I imagine we could either enable this without, or in addition to, an announce-set.
If we enable dynamic host advertisements in addition to an announce-set, we could consider dynamically adding the NO_EXPORT (or possibly NO_PEER) community to these host routes to limit the scope of propagation into the wider external network.
However, if there is no announce-set covering the External IPs, we would want the host routes to propagate through the greater external network and likely would not want to stamp the routes with NO_EXPORT or NO_PEER.

@taspelund taspelund added want Bug ddm Delay Driven Multipath Idea New ideas to consider. bgp Border Gateway Protocol labels Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bgp Border Gateway Protocol Bug ddm Delay Driven Multipath Idea New ideas to consider. want
Projects
None yet
Development

No branches or pull requests

1 participant