Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Clean up interfaceLockMap entries on endpoint deletion #1249

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open
26 changes: 19 additions & 7 deletions pkg/plugin/packetparser/packetparser_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -380,26 +380,38 @@ func (p *packetParser) endpointWatcherCallbackFn(obj interface{}) {
}

iface := event.Obj.(netlink.LinkAttrs)

ifaceKey := ifaceToKey(iface)
lockMapVal, _ := p.interfaceLockMap.LoadOrStore(ifaceKey, &sync.Mutex{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you deleting this? The interfaceLockMap allows us to store a per interface lock and we can create/delete multiple qdisc in parallel. This is necessary (in place of a single lock) because large number of pods can come up at the same time, and we should start capturing packets as quickly as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not clear on this. Why is a per-interface mutex necessary for packetparser to handle concurrent create/delete qdisc operations? As far as I understand, operations on different interfaces shouldn't cause a data race.

mu := lockMapVal.(*sync.Mutex)
mu.Lock()
defer mu.Unlock()

switch event.Type {
case endpoint.EndpointCreated:
// Create mutex only when needed
lockMapVal, _ := p.interfaceLockMap.LoadOrStore(ifaceKey, &sync.Mutex{})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious of the need here. This seems a little bit complex and could a simpler approach be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, thanks

I've removed the Mutex mechanism since we're now using a sequential approach for adding and removing interfaces

mu := lockMapVal.(*sync.Mutex)
mu.Lock()
defer mu.Unlock()

p.l.Debug("Endpoint created", zap.String("name", iface.Name))
p.createQdiscAndAttach(iface, Veth)
case endpoint.EndpointDeleted:
// Get the mutex only if it exists
lockMapVal, exists := p.interfaceLockMap.Load(ifaceKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a nitpick / question since I'm still new to Go. Is it recommended to stick with the ok idiom? Or is it fine to use other variable names like exists?

Copy link
Contributor Author

@byte-msft byte-msft Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Kamil! Yeah, I initially used exists to make the code more readable, but you're right - we should stick with ok to follow Go conventions in the main code.

For the test case though, I deliberately kept tcMapExists and lockMapExists because test code is a bit different - clarity is super important there since we're verifying specific behaviors. The more explicit naming makes it immediately obvious what we're testing for, especially when someone's debugging failed tests.

Let me know if you'd like me to update the main code to use ok instead of exists!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure yeah, let's stick with ok for the main code then :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, ok is still best in this circumstance. It's widely-understood what that means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's resolved

if !exists {
return
}
mu := lockMapVal.(*sync.Mutex)
mu.Lock()
defer mu.Unlock()

p.l.Debug("Endpoint deleted", zap.String("name", iface.Name))
// Clean.
// Clean tcMap.
if value, ok := p.tcMap.Load(ifaceKey); ok {
v := value.(*tcValue)
p.clean(v.tc, v.qdisc)
// Delete from map.
p.tcMap.Delete(ifaceKey)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to delete the ifacekey from interfaceLockMap if it's deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted my latest changes, so I brought back the interfaceLockMap and removing the ifacekey from the map


// Clean interfaceLockMap.
p.interfaceLockMap.Delete(ifaceKey)
default:
// Unknown.
p.l.Debug("Unknown event", zap.String("type", event.Type.String()))
Expand Down
18 changes: 15 additions & 3 deletions pkg/plugin/packetparser/packetparser_linux_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -162,18 +162,24 @@ func TestEndpointWatcherCallbackFn_EndpointDeleted(t *testing.T) {
ctrl := gomock.NewController(t)
defer ctrl.Finish()

// Initialize packetParser with both maps.
p := &packetParser{
cfg: cfgPodLevelEnabled,
l: log.Logger().Named("test"),
interfaceLockMap: &sync.Map{},
tcMap: &sync.Map{},
}
p.tcMap = &sync.Map{}

// Create test interface attributes.
linkAttr := netlink.LinkAttrs{
Name: "test",
HardwareAddr: []byte("test"),
NetNsID: 1,
}
key := ifaceToKey(linkAttr)

// Pre-populate both maps to simulate existing interface
p.interfaceLockMap.Store(key, &sync.Mutex{})
p.tcMap.Store(key, &tcValue{nil, &tc.Object{}})

// Create EndpointDeleted event.
Expand All @@ -182,10 +188,16 @@ func TestEndpointWatcherCallbackFn_EndpointDeleted(t *testing.T) {
Obj: linkAttr,
}

// Execute the callback.
p.endpointWatcherCallbackFn(e)

_, ok := p.tcMap.Load(key)
assert.False(t, ok)
// Verify both maps are cleaned up.
_, tcMapExists := p.tcMap.Load(key)
_, lockMapExists := p.interfaceLockMap.Load(key)

// Assert both maps are cleaned up
assert.False(t, tcMapExists, "tcMap entry should be deleted")
assert.False(t, lockMapExists, "interfaceLockMap entry should be deleted")
}

func TestCreateQdiscAndAttach(t *testing.T) {
Expand Down
Loading