Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Clean up interfaceLockMap entries on endpoint deletion #1249

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open
36 changes: 22 additions & 14 deletions pkg/plugin/packetparser/packetparser_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -373,36 +373,44 @@ func (p *packetParser) clean(rtnl nltc, qdisc *tc.Object) {
}

func (p *packetParser) endpointWatcherCallbackFn(obj interface{}) {
// Contract is that we will receive an endpoint event pointer.
event := obj.(*endpoint.EndpointEvent)
if event == nil {
return
}

iface := event.Obj.(netlink.LinkAttrs)

ifaceKey := ifaceToKey(iface)
lockMapVal, _ := p.interfaceLockMap.LoadOrStore(ifaceKey, &sync.Mutex{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you deleting this? The interfaceLockMap allows us to store a per interface lock and we can create/delete multiple qdisc in parallel. This is necessary (in place of a single lock) because large number of pods can come up at the same time, and we should start capturing packets as quickly as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not clear on this. Why is a per-interface mutex necessary for packetparser to handle concurrent create/delete qdisc operations? As far as I understand, operations on different interfaces shouldn't cause a data race.

mu := lockMapVal.(*sync.Mutex)
mu.Lock()
defer mu.Unlock()

switch event.Type {
case endpoint.EndpointCreated:
p.l.Debug("Endpoint created", zap.String("name", iface.Name))
p.createQdiscAndAttach(iface, Veth)
// Get or create mutex atomically
lockMapVal, loaded := p.interfaceLockMap.LoadOrStore(ifaceKey, &sync.Mutex{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why bring this inside the switch case? We are duplicating code in L399-L404.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Anubhab,

Yeah, I get the concern about code duplication, but in this case I think it makes sense to keep the mutex logic separate in each case. The create and delete paths need different mutex handling - creation needs to make a new mutex if it doesn't exist, while deletion only works with existing ones. So, I added an extra check.

Trying to handle it outside the switch would probably make the code more generalized and less clear, in addition we are locking the map if the other cases will be added.

What do you think? Happy to revert the code to the previous state, if you think if it's unnecessary

mu := lockMapVal.(*sync.Mutex)
mu.Lock()
defer mu.Unlock()

// Only proceed with creation if this is a new interface
if !loaded {
p.l.Debug("Endpoint created", zap.String("name", iface.Name))
p.createQdiscAndAttach(iface, Veth)
}

case endpoint.EndpointDeleted:
p.l.Debug("Endpoint deleted", zap.String("name", iface.Name))
// Clean.
lockMapVal, ok := p.interfaceLockMap.Load(ifaceKey)
if !ok {
return
}
mu := lockMapVal.(*sync.Mutex)
mu.Lock()
defer mu.Unlock()

// Clean up operations
if value, ok := p.tcMap.Load(ifaceKey); ok {
v := value.(*tcValue)
p.clean(v.tc, v.qdisc)
// Delete from map.
p.tcMap.Delete(ifaceKey)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to delete the ifacekey from interfaceLockMap if it's deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted my latest changes, so I brought back the interfaceLockMap and removing the ifacekey from the map

default:
// Unknown.
p.l.Debug("Unknown event", zap.String("type", event.Type.String()))
p.interfaceLockMap.Delete(ifaceKey)
}
}

Expand Down
18 changes: 15 additions & 3 deletions pkg/plugin/packetparser/packetparser_linux_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -162,18 +162,24 @@ func TestEndpointWatcherCallbackFn_EndpointDeleted(t *testing.T) {
ctrl := gomock.NewController(t)
defer ctrl.Finish()

// Initialize packetParser with both maps.
p := &packetParser{
cfg: cfgPodLevelEnabled,
l: log.Logger().Named("test"),
interfaceLockMap: &sync.Map{},
tcMap: &sync.Map{},
}
p.tcMap = &sync.Map{}

// Create test interface attributes.
linkAttr := netlink.LinkAttrs{
Name: "test",
HardwareAddr: []byte("test"),
NetNsID: 1,
}
key := ifaceToKey(linkAttr)

// Pre-populate both maps to simulate existing interface
p.interfaceLockMap.Store(key, &sync.Mutex{})
p.tcMap.Store(key, &tcValue{nil, &tc.Object{}})

// Create EndpointDeleted event.
Expand All @@ -182,10 +188,16 @@ func TestEndpointWatcherCallbackFn_EndpointDeleted(t *testing.T) {
Obj: linkAttr,
}

// Execute the callback.
p.endpointWatcherCallbackFn(e)

_, ok := p.tcMap.Load(key)
assert.False(t, ok)
// Verify both maps are cleaned up.
_, tcMapExists := p.tcMap.Load(key)
_, lockMapExists := p.interfaceLockMap.Load(key)

// Assert both maps are cleaned up
assert.False(t, tcMapExists, "tcMap entry should be deleted")
assert.False(t, lockMapExists, "interfaceLockMap entry should be deleted")
}

func TestCreateQdiscAndAttach(t *testing.T) {
Expand Down
Loading