Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#956 Flow control fixes #957

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

bucko909
Copy link
Contributor

See #956.

This is not a complete fix for the aggregator, but does fix the cache, relay and rewriter daemons.

I've tested the cache daemon manually and verified that it only outputs "MetricCache is nearly full" and "MetricCache below watermark" with a sufficiently high cache size. With a small cache size, it will still drop datapoints ("MetricCache is full") with these settings.

The "fix MAX_QUEUE_SIZE to match config file default" commit may be controversial. I'm happy to do whatever with this to get this merged; it just felt wrong when I found it.

@bucko909
Copy link
Contributor Author

Push just resets to preferred email.

@deniszh
Copy link
Member

deniszh commented Aug 24, 2024

@bucko909 : could you please check failed tests, pretty please?
Thanks!

@deniszh
Copy link
Member

deniszh commented Aug 24, 2024

And rebase or conflict fix needed too

It's not really clear what to do here. If someone has altered their
config file to remove this line, this patch will alter behaviour of
their instance. However, the example config default has been 10000 since
first commit, and that's probably a more sensible default value. It
might nonetheless be more sensible to just document the discrepancy.
@bucko909
Copy link
Contributor Author

Rebased to fix conflicts. Only meaningful change is the conflict, which was in the middle of these changes, and is added as context in the diff:

$ (FROM=4e105a0; NOW=d7926bd; diff -u5  <(git diff -U10 $(git merge-base $FROM origin/master)..$FROM) <(git diff -U10 $(git merge-base $NOW origin/master)..$NOW))
--- /dev/fd/63  2024-08-29 15:20:54.847098593 +0100
+++ /dev/fd/62  2024-08-29 15:20:54.851098553 +0100
@@ -131,14 +131,14 @@
  # If enabled this setting is used to timeout metric client connection if no
  # metrics have been sent in specified time in seconds
  #METRIC_CLIENT_IDLE_TIMEOUT = None
  
 diff --git a/lib/carbon/cache.py b/lib/carbon/cache.py
-index cf8b7d1..a7a0c7e 100644
+index 72ccf6c..0304143 100644
 --- a/lib/carbon/cache.py
 +++ b/lib/carbon/cache.py
-@@ -200,20 +200,27 @@ class _MetricCache(defaultdict):
+@@ -201,20 +201,27 @@ class _MetricCache(defaultdict):
              in self.items()]
  
    @property
    def watermarks(self):
      return [(metric, min(datapoints.keys()), max(datapoints.keys()))
@@ -162,11 +162,11 @@
    def _check_available_space(self):
      if state.cacheTooFull and self.size < settings.CACHE_SIZE_LOW_WATERMARK:
        log.msg("MetricCache below watermark: self.size=%d" % self.size)
        events.cacheSpaceAvailable()
  
-@@ -244,22 +251,26 @@ class _MetricCache(defaultdict):
+@@ -245,22 +252,26 @@ class _MetricCache(defaultdict):
  
      return sorted(datapoint_index.items(), key=by_timestamp)
  
    def store(self, metric, datapoint):
      timestamp, value = datapoint
@@ -180,20 +180,20 @@
          else:
 +          if self.is_nearly_full:
 +            # This will disable reading when flow control is enabled
 +            log.msg("MetricCache is nearly full: self.size=%d" % self.size)
 +            events.cacheFull()
+           if not self[metric]:
+             self.new_metrics.append(metric)
            self.size += 1
            self[metric][timestamp] = value
            if self.strategy:
              self.strategy.store(metric)
        else:
          # Updating a duplicate does not increase the cache size
          self[metric][timestamp] = value
  
- 
- _Cache = None
 diff --git a/lib/carbon/client.py b/lib/carbon/client.py
 index d9ba5a9..6bed7b7 100644
 --- a/lib/carbon/client.py
 +++ b/lib/carbon/client.py
 @@ -28,20 +28,24 @@ try:

I've used a smaller overage for this setting, as the cache is a much
slower process than the relay, so hopefully its size is much higher.
@bucko909
Copy link
Contributor Author

I think this should fix the tests -- apologies; I just ran pytest but it looks like it didn't find the other two classes in that file.

The fixes are squashed into "[#956] allow instances to go 25% over their hard max".

tox with no parameters also runs a benchmark. The benchmark does not appear to terminate in any reasonable time when performing "Benchmarking unique metric MetricCache drain..." on 1M datapoints. I tested it on master and the same is true, so I've not broken this.

@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 37.93103% with 18 lines in your changes missing coverage. Please review.

Project coverage is 50.46%. Comparing base (fdc56f6) to head (dbf2925).
Report is 17 commits behind head on master.

Files with missing lines Patch % Lines
lib/carbon/service.py 0.00% 9 Missing ⚠️
lib/carbon/client.py 16.66% 4 Missing and 1 partial ⚠️
lib/carbon/conf.py 0.00% 3 Missing ⚠️
lib/carbon/events.py 50.00% 0 Missing and 1 partial ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #957      +/-   ##
==========================================
- Coverage   50.63%   50.46%   -0.18%     
==========================================
  Files          36       36              
  Lines        3446     3474      +28     
  Branches      535      534       -1     
==========================================
+ Hits         1745     1753       +8     
- Misses       1574     1592      +18     
- Partials      127      129       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants