Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InboundBuffer inherits intended memory leak from System.Threading.Channels #84

Open
b-meridian opened this issue Jan 3, 2025 · 0 comments

Comments

@b-meridian
Copy link

Summary

InboundBuffer<TEvent>, in the WaitToReadAsync method, sends a CancellationToken to a ChannelReader from System.Threading.Channels.

Per dotnet/runtime#761 (comment) this causes a memory leak which is intentional- the engineers for that project do not keep the channel's objects in a doubly-linked list, so, for performance reasons, they do not deal with cancellation token events until the next write where they then remove the cancelled objects from the list. However, even after a write on the channel, this leaves the generated System.OperationCanceledException objects pinned that will never get garbage collected. These build up over time, forming a memory leak.

Code to Reproduce

For brevity, the issue can be reproduced using the following single line of code, followed by a while (true) { } and then waiting:

var noopChannel = new NoopBufferedChannel(new NoopBufferedChannel.NoopChannelOptions());

We've setup a detailed repository here which has more detail and fully working real-life usage code, showing how we stumbled across the issue when upgrading our system to ElasticSearch 8.x and using the latest Serilog NuGet package (v4.2.0) to send logs to Elastic:

https://github.com/b-meridian/eschannels

That repo also includes additional screenshots showing the memory leak captured by Visual Studio's Diagnostic Tools. Please check out the repo's README for more details.

Our Use Case

This memory leak is slow. In our testing and with default buffer options, it grows at ~600kb per hour for a single instance of a BufferedChannel from Elastic.Channels. However, we have hundreds of instances of microservices that run for weeks as part of a web application service. For security and auditing reasons, each instance is required to write relevant logs to our Elastic cluster. We have seen scenarios where tens of gigabytes of memory is taken up entirely by just collective instances of System.OperationCanceledException which all originate from InboundBuffer<TEvent>.

Suggested Solution

The System.Threading.Channels engineers have a lot of other use cases to consider and do not have any plan (currently) to switch over to something like a doubly-linked list for their channel implementation. We ended up modifying the Elastic.Channels assembly to not pass a cancellation token to them, but instead handle it internally. The solution isn't exactly elegant, but it retains functionality while also eliminating the memory leak. It is also a low performance-overhead cost for this specific usage of ChannelReader, where the timeout event is expected to occur only once every several seconds (the default for OutboundBufferMaxLifetime is 5 seconds). We are currently using this in production:

in /src/Elastic.Channels/Buffers/InboundBuffer.cs, and specifically in the WaitToReadAsync method, in the main try loop, we do the following:

  try
  {
      _breaker.CancelAfter(Wait);
      var readTask = reader.WaitToReadAsync().AsTask();
      var delayTask = Task.Delay(Timeout.Infinite, _breaker.Token);
      var completedTask = await Task.WhenAny(readTask, delayTask).ConfigureAwait(false);
  
      if (completedTask == delayTask)
      {
          throw new OperationCanceledException(_breaker.Token);
      }
  
      _breaker.CancelAfter(-1);
      return await readTask.ConfigureAwait(false);
  }

This allows Elastic.Channels to continue using System.Threading.Channels.

We are hoping to get this addressed in the main Elastic.Channels package so that we do not have to maintain a separate copy of the code and the resulting supply-chain-management impacts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant