Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Subnet lock for SubnetPort creation #974

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yanjunz97
Copy link
Contributor

Instead of using lock for Subnet, this PR adds a cache in SubnetPort service
to record the number of SubnetPorts on each Subnet.
Read/Write operation for this cache is protected by lock to avoid race condition
and ensure the number of SubnetPort on the Subnet will be controlled below
the Subnet capacity.

Testing done:

  • Create 16 Pods on default SubnetSet with size 16 (capacity 12), got all Pods ready;
    delete all Pods and observe the Subnets under the SubnetSet will be deleted by GC.
  • Create 16 VMs on DHCPServer SubnetSet with size 16 (capacity 12), all Vms can get IP;
    delete all VMs and observe the Subnet under the SubnetSet will be deleted by GC.
  • Create 16 VMs on DHCPServer Subnet with size 16 (capacity 12), only 12 VMs can PowerOn
    and get IP. The remainings will have VirtualMachineNetworkReady as NotReady and
    Message like network interface "eth0" error: subnetPort is not ready: SubnetPortNotReady - error occurred while processing the SubnetPort CR. Error: no valid IP in Subnet /orgs/default/projects/project-quality/vpcs/ns-1_7fcad11c-da38-4137-96ec-f0f80ae8d96b/subnets/subnet-test-1_60ebfa36-bbe2-4e6e-85df-8228be4e190a

@yanjunz97 yanjunz97 force-pushed the refactor-subnet-lock branch 2 times, most recently from ac515ab to dc0b2c3 Compare December 23, 2024 08:07
@yanjunz97 yanjunz97 force-pushed the refactor-subnet-lock branch from dc0b2c3 to f78aabf Compare December 23, 2024 08:20
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 66.43836% with 49 lines in your changes missing coverage. Please review.

Project coverage is 73.22%. Comparing base (86c0d65) to head (f78aabf).

Files with missing lines Patch % Lines
pkg/nsx/services/subnetport/store.go 64.40% 19 Missing and 2 partials ⚠️
...kg/controllers/subnetport/subnetport_controller.go 29.41% 9 Missing and 3 partials ⚠️
pkg/nsx/services/subnetport/subnetport.go 82.92% 7 Missing ⚠️
pkg/controllers/common/utils.go 57.14% 2 Missing and 1 partial ⚠️
pkg/controllers/pod/pod_controller.go 50.00% 2 Missing and 1 partial ⚠️
pkg/controllers/subnetset/subnetset_controller.go 80.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #974      +/-   ##
==========================================
- Coverage   73.37%   73.22%   -0.16%     
==========================================
  Files         118      118              
  Lines       16394    16440      +46     
==========================================
+ Hits        12029    12038       +9     
- Misses       3581     3612      +31     
- Partials      784      790       +6     
Flag Coverage Δ
unit-tests 73.22% <66.43%> (-0.16%) ⬇️
Files with missing lines Coverage Δ
pkg/controllers/subnet/subnet_controller.go 63.74% <100.00%> (+0.10%) ⬆️
pkg/nsx/services/subnet/store.go 71.64% <ø> (-3.90%) ⬇️
pkg/nsx/services/subnet/subnet.go 64.78% <ø> (-1.97%) ⬇️
pkg/controllers/common/utils.go 88.95% <57.14%> (-1.53%) ⬇️
pkg/controllers/pod/pod_controller.go 60.41% <50.00%> (-1.57%) ⬇️
pkg/controllers/subnetset/subnetset_controller.go 65.88% <80.00%> (-0.11%) ⬇️
pkg/nsx/services/subnetport/subnetport.go 81.08% <82.92%> (+0.25%) ⬆️
...kg/controllers/subnetport/subnetport_controller.go 74.15% <29.41%> (-1.54%) ⬇️
pkg/nsx/services/subnetport/store.go 71.83% <64.40%> (-5.28%) ⬇️

@@ -373,3 +404,19 @@ func (service *SubnetPortService) Cleanup(ctx context.Context) error {
}
return nil
}

func (service *SubnetPortService) AddPortToSubnet(path string, size int) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about rename the functions as AllocatePortFromSubnet and ReleasePortInSubnet?

}
}
// size 0 indicates the Subnet is empty
if !count.markForDelete && (count.value < size || size == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case markForDelete is set as true?

@@ -128,3 +131,75 @@ func (subnetPortStore *SubnetPortStore) GetByIndex(key string, value string) []*
}
return subnetPorts
}

func (subnetPortStore *SubnetPortStore) AddPortToSubnet(path string, size int) bool {
subnetPortStore.portCountMapLock.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock scale is lager if we use it on the complete store, can we use lock on the subnet path only?

type SubnetPortService struct {
servicecommon.Service
SubnetPortStore *SubnetPortStore
}

type CountWithMark struct {
value int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about using atomic as the type of value, and use atomic.AddInt(count, 1) or atomic.AddInt(count, -1) to control the value? Or you can use a customized lock in CountWithMark

}()
}

if err := r.SubnetService.DeleteSubnet(*nsxSubnet); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have a race that when this routine is deleting the subnet, and pod_controller/subnetport_controller is using the subnet to allocate a new port?

@@ -469,7 +471,17 @@ func (r *SubnetPortReconciler) CheckAndGetSubnetPathForSubnetPort(ctx context.Co
log.Error(err, "failed to get NSX subnet by subnet CR UID", "subnetList", subnetList)
return
}
subnetPath = *subnetList[0].Path
nsxSubnet := subnetList[0]
totalIP := int(*nsxSubnet.Ipv4SubnetSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to use totalIP as a field of CountWithMark which is a one-time set value when we create the object for a new subnet, then we can consume it any time we need to allocate a new port with the diff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants