-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
collector/filesystem: Handle Statfs_t
overflows
#2965
Conversation
a18b677
to
125c2e4
Compare
Handle cases where, owing to multiplying two `uint64` integers and typecasting it to `float64`, the overall precision is lost when the values concerned exceed the `floatMantissa64` (1 << 53) before or after the operation (which is well within the acceptable `uint64` range). Fixes: prometheus#1672 Signed-off-by: Pranshu Srivastava <[email protected]>
Signed-off-by: Pranshu Srivastava <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
If you are going to make use of |
Feel free to correct me if I'm missing something, but I believe we are casting them right before returning? Also, except Additionally, |
@rexagod Yes, I'm aware that Since size, _ := new(big.Float).SetUint64(buf.Blocks).Mul(new(big.Float).SetUint64(buf.Blocks), new(big.Float).SetInt64(int64(buf.Bsize))).Float64() This can be simplified to: size, _ := new(big.Int).Mul(new(big.Int).SetUint64(buf.Blocks), new(big.Int).SetInt64(buf.Bsize)).Float64() Keep things simple during the calculation, and only cast to |
I see your point. Even though we explicitly |
Signed-off-by: Pranshu Srivastava <[email protected]>
Ah, I see that This brings me to my second point though - why does this PR only address 64-bit archs, i.e. TBH, I'm not entirely convinced that the original issue #1672 was caused by an overflow. The method The original issue stated:
We only get a Grafana screenshot in the issue. It would have been a lot more helpful to have the raw metrics, so we could perhaps establish some relationship between the values, i.e. verify whether some value was in the ballpark of a wraparound. Even more helpful would have been the output of The fact remains that even a version as old as the one mentioned in the issue (v1.0.0-rc.0) used the Since I cannot see how node_exporter could have produced a value approaching 8e+22 as shown in the Grafana screenshot, my gut feeling is that the promql must have included some additional multiplier, or perhaps the panel series unit was set incorrectly, resulting in Grafana performing multiplication by an additional factor. Using |
if err != nil { | ||
labels.deviceError = err.Error() | ||
level.Debug(c.logger).Log("msg", "Error on statfs() system call", "rootfs", rootfsFilePath(labels.mountPoint), "err", err) | ||
// Handle for under/over-flow for cases where: "node_filesystem_{free,avail}_bytes reporting values are larger than node_filesystem_size_bytes". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wording of this comment suggests that statfs could return a result with free or available blocks greater than total blocks, which is a logical impossibility. Presumably what you're trying to convey is that filesystems with suitably large total blocks and a suitably large block size can overflow a uint64 when multiplied together, resulting in wraparound, and the appearance that free / available bytes are greater than total bytes (assuming that they have not also caused overflow / wraparound in their calculation).
I gave this patch another thought, and while we could get something in to address the possible, but not probable over-or-under-flow wraparound that could have caused this, or atleast what was my only conclusion from the data provided in the original issue, however unlikely, since we do see the raw metrics in the panel exhibiting values that are not at all expected, I'm convinced we should defer merging this in favor of being provided the requested information or some reproducibility that would help assess the exact cause of this issue with more confidence. cc @treydock |
Handle cases where, owing to multiplying two
uint64
integers and typecasting it tofloat64
, the overall precision is lost when the values concerned exceed thefloatMantissa64
(1 << 53) before or after the operation (which is well within the acceptableuint64
range).Fixes: #1672