You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The auto scaler logic is greedy which can lead to scaling "bouncing". If you have a "calmer" 5m period then it will scale you in. Spanner scaling events are not smooth experiences and we'd like to avoid when possible. Here is an important production database (this example from today):
We don't want that scale in behavior, but we do want the immediate scale out behavior. We'd prefer an "Asymmetric Policy", example:
Scale in policy: "only scale in when it is clearly a good idea"
"if you see that for over 1h/2h/4h that you want to scale in the entire time without exception, then scale in."
Scale out policy: "scale out whenever things get hot"
"if you see over the last 5m that you need more, go ahead and scale out."
This is not easily expressible right now.
scaleInCoolingMinutes would let the bounce happen.
add another high priority CPU metric with look back "period" = 4h (or, 4h divided by 5). The problem here is that if there is a spike and it scales out, then even after scaling out it will see that spike in the 4h look back and then want to scale out again.
The only solution I can see is to use Custom Scaling Methods. You'd have to define some scale_out metrics which have the short lookback (5m) and scale_in metrics that have the longer lookback (4hr) then process those metrics differently in the calculateSize()
Asymmetric Metrics Error Handling - In the case where metrics are bad - see Ignore Bad Values from Google Metrics #355 we'd want different behavior on scale out/scale in. If you get an incomplete metric set (say a CPU metric is zero), the entire possibility of scaling in should be discarded (fail static). If a single metric does return with signal that scaling out is warranted, then scale out should happen, an incomplete signal can be enough to scale out. Again this could be specially treated in a Custom Scaling Method.
Summary
I think I can make things work using Custom Scaling Method and maybe I will do that, but I think in general users of this project probably want the same things I do, and addressing this in the core project would be a good idea. Thanks.
==
Side note: How "Want Scale To" log metric works:
Examples:
storage=0.5885764076510477%, BELOW the range [70%-80%] => however, cannot scale to 100 because it is lower than MIN 6000 PROCESSING_UNITS
high_priority_cpu=7.566610239183218%, BELOW the range [60%-70%] => however, cannot scale to 700 because it is lower than MIN 6000 PROCESSING_UNITS
"want_scale_to %%{data:ignore}cannot scale to %%{number:want_scale_to}%%{data:ignore}"
The text was updated successfully, but these errors were encountered:
Background
The current autoscaler logic roughly:
Problems
We don't want that scale in behavior, but we do want the immediate scale out behavior. We'd prefer an "Asymmetric Policy", example:
This is not easily expressible right now.
Summary
I think I can make things work using Custom Scaling Method and maybe I will do that, but I think in general users of this project probably want the same things I do, and addressing this in the core project would be a good idea. Thanks.
==
Side note: How "Want Scale To" log metric works:
The text was updated successfully, but these errors were encountered: