RPW for OPTE v2p Mappings #5568

internet-diglett · 2024-04-19T01:51:55Z

TODO

Extend db view to include probe v2p mappings
Update sagas to trigger rpw activation instead of directly configuring v2p mappings
Test that the delete functionality cleans up v2p mappings

Possible Optimizations

Expose internal API that a sled-agent can call to pre-emptively refresh v2p mappings? (this will allow individual sleds to have v2p entries repopulated more rapidly upon restarting, as long as Nexus is running)

Desired State

VPC 14487919
----------------------------------------------------------------------

IPv4 mappings
----------------------------------------------------------------------
VPC_IP      VPC MAC ADDR       UNDERLAY IP
172.30.0.5  A8:40:25:FD:DA:2B  fd00:1122:3344:102::1
172.30.0.6  A8:40:25:FE:89:1F  fd00:1122:3344:103::1
172.30.0.7  A8:40:25:F8:CA:88  fd00:1122:3344:101::1
172.30.0.8  A8:40:25:F6:66:01  fd00:1122:3344:104::1

Push the v2p state away from the desired state

root@g3:~# /opt/oxide/opte/bin/opteadm set-v2p 172.30.0.7 A8:40:25:FE:89:1F fd00:1122:3344:103::1 14487919

v2p state is now incorrect

VPC 14487919
----------------------------------------------------------------------

IPv4 mappings
----------------------------------------------------------------------
VPC_IP      VPC MAC ADDR       UNDERLAY IP
172.30.0.5  A8:40:25:FD:DA:2B  fd00:1122:3344:102::1
172.30.0.6  A8:40:25:FE:89:1F  fd00:1122:3344:103::1
172.30.0.7  A8:40:25:FE:89:1F  fd00:1122:3344:103::1
172.30.0.8  A8:40:25:F6:66:01  fd00:1122:3344:104::1

rpw corrects the drift

17:47:32.713Z INFO e7ef9388-d6e1-4f1c-8a8b-0c1a0987e391 (ServerContext): v2p mappings to add                                                            
    background_task = v2p_manager                                                                                                                       
    file = nexus/src/app/background/v2p_mappings.rs:147                                                                                                 
    mappings = [SetVirtualNetworkInterfaceHost { physical_host_ip: fd00:1122:3344:101::1, virtual_ip: 172.30.0.7, virtual_mac: MacAddr(MacAddr6([168, 64
, 37, 248, 202, 136])), vni: Vni(14487919) }]                                                                                                           
    sled = g3

v2p state is now correct again

VPC 14487919
----------------------------------------------------------------------

IPv4 mappings
----------------------------------------------------------------------
VPC_IP      VPC MAC ADDR       UNDERLAY IP
172.30.0.5  A8:40:25:FD:DA:2B  fd00:1122:3344:102::1
172.30.0.6  A8:40:25:FE:89:1F  fd00:1122:3344:103::1
172.30.0.7  A8:40:25:F8:CA:88  fd00:1122:3344:101::1
172.30.0.8  A8:40:25:F6:66:01  fd00:1122:3344:104::1

internet-diglett · 2024-04-24T18:05:42Z

Going to shift gears and finish up the delete implementation in the OPTE side of things

nexus/src/app/background/init.rs

jmpesp

🚀

illumos-utils/src/opte/port_manager.rs

nexus/db-queries/src/db/datastore/network_interface.rs

jmpesp · 2024-05-03T15:51:31Z

nexus/db-queries/src/db/datastore/v2p_mapping.rs

+use omicron_common::api::external::ListResultVec;
+
+impl DataStore {
+    pub async fn v2p_mappings(


does this also need a paginated query?

Well... I admittedly tend to use DataPageParams::max_page() in RPWs whenever using db queries that require pagination parameters, and that will return u32::MAX (4_294_967_296) entries, so I'd say it's kind of moot here, unless we want to standardize on a lower limit for db queries in background tasks (which honestly, it wouldn't shock me if we decide that it would be a good idea to do so!)

Under @davepacheco's direction we've started using this pattern in reconfigurator RPWs:

omicron/nexus/db-queries/src/db/datastore/zpool.rs

Lines 125 to 134 in 8ffe0e1

opctx.check_complex_operations_allowed()?;

let mut zpools = Vec::new();

let mut paginator = Paginator::new(SQL_BATCH_SIZE);

while let Some(p) = paginator.next() {

let batch = self

.zpool_list_all_external(opctx, &p.current_pagparams())

.await?;

paginator = p.found_batch(&batch, &|(z, _)| z.id());

zpools.extend(batch);

}

check_complex_operations_allowed() fails the opctx is associated with an external API client, and then we do still list all the items, but in paginated batches to avoid a giant result set coming from crdb.

I have updated the query to use this pattern.

nexus/src/app/background/v2p_mappings.rs

jmpesp · 2024-05-03T15:59:23Z

nexus/src/app/instance_network.rs

-        // - it means that delete calls are required as well as set calls,
-        //   meaning that now the ordering of those matters (this may also
-        //   necessitate a generation number for V2P mappings)


I don't think generation numbers are required with the approach of using a periodic background task to correct drift... what do you think?

I think it mostly depends on whether or not it is acceptable to send / read the entire configuration during a reconciliation loop. For things like NAT, the total configuration can be quite large, so we employ generation numbers so we can send only the required updates instead of the full set. For v2p mappings, the individual messages are much smaller. My napkin math from before showed that if we had a vm per core, with a vnic per vm, that would be something in the ballpark of 213KB for a full set of updates for a single sled, which seems reasonable.

…rkstations

internet-diglett · 2024-05-18T01:34:45Z

All outstanding comments have been resolved, going ahead and setting this to auto-merge unless anyone has an objection. I'd like to get this soaking on Dogfood sooner rather than later for the upcoming release :)

internet-diglett · 2024-05-18T06:43:06Z

sigh looks like a test is flaky

--- STDERR:              omicron-nexus::test_all integration_tests::instances::test_instance_v2p_mappings ---
log file: /tmp/test_all-0075ccc75846a7de-test_instance_v2p_mappings.15117.0.log
note: configured to log to "/tmp/test_all-0075ccc75846a7de-test_instance_v2p_mappings.15117.0.log"
DB URL: postgresql://root@[::1]:35454/omicron?sslmode=disable
DB address: [::1]:35454
log file: /tmp/test_all-0075ccc75846a7de-test_instance_v2p_mappings.15117.2.log
note: configured to log to "/tmp/test_all-0075ccc75846a7de-test_instance_v2p_mappings.15117.2.log"
log file: /tmp/test_all-0075ccc75846a7de-test_instance_v2p_mappings.15117.3.log
note: configured to log to "/tmp/test_all-0075ccc75846a7de-test_instance_v2p_mappings.15117.3.log"
thread 'integration_tests::instances::test_instance_v2p_mappings' panicked at nexus/tests/integration_tests/instances.rs:4561:10:
v2p mappings should be empty: PermanentError("v2p mappings are still present")

internet-diglett · 2024-05-21T19:02:45Z

Fixed the flaky test. I was just using the tooling wrong 😛

Levon Tarver and others added 4 commits April 11, 2024 18:50

begin work on v2p mapping rpw

77421de

more scaffolding

889d00f

basic rpw for opte v2p mappings

946a81f

fix tests

8a7ffd8

add noop for sim sled-agent

ca42e1e

internet-diglett force-pushed the issue-5214-v2p-mapping-rpw branch from ea311b8 to ca42e1e Compare April 19, 2024 21:57

add probes to v2p mapping view

9daef1f

internet-diglett added 2 commits April 24, 2024 18:48

adjust column type in schema

548bfce

WIP: convert nexus v2p management to rpw activation

df9dea5

internet-diglett mentioned this pull request Apr 29, 2024

Instance V2P management functions should include the instance's current sled #3107

Closed

internet-diglett added 2 commits April 29, 2024 21:10

fixup! WIP: convert nexus v2p management to rpw activation

7a73bf8

rework schema for proper exclusion of deleted vnics

4df4fce

internet-diglett commented Apr 29, 2024

View reviewed changes

nexus/src/app/background/init.rs Outdated Show resolved Hide resolved

internet-diglett marked this pull request as ready for review April 29, 2024 23:45

internet-diglett added 2 commits April 30, 2024 18:05

back out accidental dev-env changes

389b6bb

use full namespace

3a92191

internet-diglett requested review from davepacheco and jmpesp May 2, 2024 20:27

jmpesp approved these changes May 3, 2024

View reviewed changes

internet-diglett added 7 commits May 3, 2024 20:11

pr review fixes

a1ba8bb

bump vdev size so we can not hit crucible errors when deploying to wo…

f12eff6

…rkstations

Merge branch 'main' into issue-5214-v2p-mapping-rpw

e9e5261

post-rebase updates

9e9ba7f

bump opte version in deploy task

33ba7df

feed clippy

d550326

pr fixes, bump opte again

d06ece9

internet-diglett disabled auto-merge May 18, 2024 06:43

internet-diglett added 2 commits May 21, 2024 18:17

use the thing correctly please

5e5168e

Merge branch 'main' into issue-5214-v2p-mapping-rpw

228bd54

internet-diglett enabled auto-merge (squash) May 21, 2024 19:02

internet-diglett disabled auto-merge May 22, 2024 15:57

rollback maghemite

0bf66d7

internet-diglett enabled auto-merge (squash) May 22, 2024 17:49

internet-diglett merged commit 2082942 into main May 22, 2024
22 checks passed

internet-diglett deleted the issue-5214-v2p-mapping-rpw branch May 22, 2024 18:52

internet-diglett added this to the 9 milestone May 23, 2024

hawkw added a commit that referenced this pull request May 24, 2024

update delete_v2p_mappings in light of #5568

a3331f3

This was referenced May 30, 2024

Instance start/delete sagas hang while sleds are unreachable #4259

Closed

New instances in a rebooted sled are unable to reach existing instances in other sleds on their private IPs #5214

Closed

hawkw added a commit that referenced this pull request May 30, 2024

update delete_v2p_mappings in light of #5568

fb7dc62

hawkw added a commit that referenced this pull request Jun 3, 2024

update delete_v2p_mappings in light of #5568

ecc6f91

hawkw added a commit that referenced this pull request Jun 10, 2024

update delete_v2p_mappings in light of #5568

23f1a66

hawkw added a commit that referenced this pull request Jun 13, 2024

update delete_v2p_mappings in light of #5568

d800d74

hawkw added a commit that referenced this pull request Jun 19, 2024

update delete_v2p_mappings in light of #5568

01973bd

hawkw added a commit that referenced this pull request Jun 24, 2024

update delete_v2p_mappings in light of #5568

315e7f0

hawkw added a commit that referenced this pull request Jul 1, 2024

update delete_v2p_mappings in light of #5568

1fd5f47

hawkw added a commit that referenced this pull request Jul 3, 2024

update delete_v2p_mappings in light of #5568

e9c9dbc

hawkw added a commit that referenced this pull request Jul 10, 2024

update delete_v2p_mappings in light of #5568

4f2e541

hawkw added a commit that referenced this pull request Jul 17, 2024

update delete_v2p_mappings in light of #5568

93c54bc

hawkw added a commit that referenced this pull request Jul 26, 2024

update delete_v2p_mappings in light of #5568

6564a3d

hawkw added a commit that referenced this pull request Jul 27, 2024

update delete_v2p_mappings in light of #5568

a351b9c

hawkw added a commit that referenced this pull request Jul 30, 2024

update delete_v2p_mappings in light of #5568

1be42b6

hawkw added a commit that referenced this pull request Aug 9, 2024

update delete_v2p_mappings in light of #5568

a5b6d9e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RPW for OPTE v2p Mappings #5568

RPW for OPTE v2p Mappings #5568

internet-diglett commented Apr 19, 2024 •

edited

Loading

davepacheco commented Apr 19, 2024

internet-diglett commented Apr 19, 2024

internet-diglett commented Apr 24, 2024

internet-diglett commented Apr 24, 2024

jmpesp left a comment

jmpesp May 3, 2024

internet-diglett May 3, 2024

jgallagher May 3, 2024

internet-diglett May 10, 2024

jmpesp May 3, 2024

internet-diglett May 3, 2024

jmpesp May 9, 2024

internet-diglett commented May 18, 2024

internet-diglett commented May 18, 2024

internet-diglett commented May 21, 2024

	opctx.check_complex_operations_allowed()?;
	let mut zpools = Vec::new();
	let mut paginator = Paginator::new(SQL_BATCH_SIZE);
	while let Some(p) = paginator.next() {
	let batch = self
	.zpool_list_all_external(opctx, &p.current_pagparams())
	.await?;
	paginator = p.found_batch(&batch, &\|(z, _)\| z.id());
	zpools.extend(batch);
	}

RPW for OPTE v2p Mappings #5568

RPW for OPTE v2p Mappings #5568

Conversation

internet-diglett commented Apr 19, 2024 • edited Loading

TODO

Possible Optimizations

Related

davepacheco commented Apr 19, 2024

internet-diglett commented Apr 19, 2024

internet-diglett commented Apr 24, 2024

Desired State

Push the v2p state away from the desired state

v2p state is now incorrect

rpw corrects the drift

v2p state is now correct again

internet-diglett commented Apr 24, 2024

jmpesp left a comment

Choose a reason for hiding this comment

jmpesp May 3, 2024

Choose a reason for hiding this comment

internet-diglett May 3, 2024

Choose a reason for hiding this comment

jgallagher May 3, 2024

Choose a reason for hiding this comment

internet-diglett May 10, 2024

Choose a reason for hiding this comment

jmpesp May 3, 2024

Choose a reason for hiding this comment

internet-diglett May 3, 2024

Choose a reason for hiding this comment

jmpesp May 9, 2024

Choose a reason for hiding this comment

internet-diglett commented May 18, 2024

internet-diglett commented May 18, 2024

internet-diglett commented May 21, 2024

internet-diglett commented Apr 19, 2024 •

edited

Loading