-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
process_collector: fill in most statistics on macOS #1600
Conversation
Unfortunately, the virtual memory, resident memory, and network stats will require access to undocumented C functions. I was warned off of cgo in IRC because it would then have to be enabled in a bunch of different projects that use this module, but I already was against it because that would break the ability to cross-compile. There is no interface to `dlopen` built into golang. The `github.com/ebitengine/purego` module looks promising (I can cross-compile and call these methods), but I'm currently getting unexpected results. I'll follow up with that separately if I can get it working, but hopefully this stuff is pretty uncontroversial. Tested on macOS 10.14.6 (amd64), macOS 14.6.1 (amd64), and macOS 15.0 (arm64) by spawning `/usr/bin/ulimit -a -S` and `/usr/sbin/lsof -c $my_process` from the test exporter process, and `ps -o lstart,vsize,rss,utime,stime,command` from the shell, and comparing results with the exported metrics. I can't find documentation for `RLIMIT_AS` on macOS (specifically if it's in bytes or pages). It's currently being reported back as `RLIM_INFINITY`, which seems reasonable, because I've come across reports that the value is ignored anyway[1]. The bash 3.2 code for the built-in `ulimit` divides the value reported by `getrusage(2)` by 1024 when printing, as it does for `RLIMIT_DATA`, which is documented as being bytes in `getrusage(2)`. The help for `ulimit` indicates it prints both in kbytes, so it's reasonable to assume this is already in bytes. [1] https://issues.chromium.org/issues/40581251#comment3 Signed-off-by: Matt Harbison <[email protected]>
8bb6fc6
to
5836830
Compare
For context about the memory stats future work, it looks like No idea if the "try to determine if this task has the split libraries mapped in..." extra processing around line 130 is relevant to Go processes. Most blog and SO references I've seen for getting virtual memory counts from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, minor nit.
Co-authored-by: Ben Kochie <[email protected]> Signed-off-by: Matt Harbison <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, will leave the final review up to the maintainers.
d0708ad
to
5d98e83
Compare
I guess I forgot to tag a maintainer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing starting point, thanks! Tiny suggestions, but not blocking, up to you. LGTM
Thanks! |
prometheus#1600) Unfortunately, these values aren't available from getrusage(2), or any other builtin Go API. Using cgo is one alternative. It's possible to conditionalize everything such that cgo can remain disabled on non-Darwin platforms, or even when cross-compiling Darwin executables on a non-Darwin platform (and stub in code that causes the metrics to not be exported). `CGO_ENABLED=1` is set by default on macOS, but unfortunately is off for the non-host architecture, even when gcc supports cross-compiling. (e.g. building with GOARCH=amd on an M2 mac skipped the cgo code.) I think that's too subtle of a distinction to rely on cgo. There's no builtin equivalent of `syscall.NewLazyDLL()` and `.NewProc()` on macOS that Go provides for Windows, so we're stuck with a 3rd party dependency. But it seems stable, maintained, ang getting a fair amount of usage. I'm avoiding their struct deserialization because these native structs are packed differently than the equivalent Go structs, which was causing bad values to be returned. The code is heavy with inline comments, and I tried keeping the type names the same as the C code to make it easier to search for them. I'm not sure that we need to do the `mach_vm_region()` call to adjust the `task_info()` values, because I've never seen that conditional evaluate to True on either amd64, arm64, or when amd64 is run under Rosetta. But this is what `ps(1)` does, and I think it's reasonable to try to match that unless somebody knows it's dead code. Signed-off-by: Matt Harbison <[email protected]>
prometheus#1600) Unfortunately, these values aren't available from getrusage(2), or any other builtin Go API. Go itself doesn't provide a mechanism (like on Windows) to call into system libraries. Using a 3rd party package[1] to dynamically call system libraries was proposed and rejected, to avoid adding to the number of dependencies. That leaves using cgo, which is used here when available. When not available (either because of cross compiling or explicitly disabling it), a stub function is linked instead, and the metrics are not exported. That way, cross compiling of other platforms is unaffected (and can also still be done with Darwin too, but at the cost of not exporting these metrics). Note that building an amd64 image on an arm64 mac or vice-versa is cross compiling, and will use the stub method by default. This can be avoided by setting `CGO_ENABLED=1` in the environment to force the use of cgo for both architectures. I'm unsure of the usefulness of the potential adjustment made to the virtual memory value after calling `mach_vm_region()`. I've not seen that code get run with a native amd64 or arm64 image, or with an amd64 image running under Rosetta. But that's what the `ps(1)` command does, and I think we should report what the system tools do. When I was testing this on a beta of macOS 15 with Go 1.21.13 (the current minimum support for this module), the amd64 image ran fine under Rosetta, but the arm64 image immediately printed a message that it was killed, even prior to the cgo call. This seems to be a recurring issue on macOS[2][3], and passing `-ldflags -s` to `go build` avoided the issue. Go 1.23.1 worked out of the box, without fiddling with linker flags, so I don't think this is an issue- Go 1.21 is simply too old to support macOS 15, but I thought it was worth noting. I supposed we could gate the cgo code with an additional build flag, if anyone is concerned about this. [1] https://github.com/ebitengine/purego [2] golang/go#19841 (comment) [3] golang/go#11887 (comment)
prometheus#1600) Unfortunately, these values aren't available from getrusage(2), or any other builtin Go API. Go itself doesn't provide a mechanism (like on Windows) to call into system libraries. Using a 3rd party package[1] to dynamically call system libraries was proposed and rejected, to avoid adding to the number of dependencies. That leaves using cgo, which is used here when available. When not available (either because of cross compiling or explicitly disabling it), a stub function is linked instead, and the metrics are not exported. That way, cross compiling of other platforms is unaffected (and can also still be done with Darwin too, but at the cost of not exporting these metrics). Note that building an amd64 image on an arm64 mac or vice-versa is cross compiling, and will use the stub method by default. This can be avoided by setting `CGO_ENABLED=1` in the environment to force the use of cgo for both architectures. I'm unsure of the usefulness of the potential adjustment made to the virtual memory value after calling `mach_vm_region()`. I've not seen that code get run with a native amd64 or arm64 image, or with an amd64 image running under Rosetta. But that's what the `ps(1)` command does, and I think we should report what the system tools do. When I was testing this on a beta of macOS 15 with Go 1.21.13 (the current minimum support for this module), the amd64 image ran fine under Rosetta, but the arm64 image immediately printed a message that it was killed, even prior to the cgo call. This seems to be a recurring issue on macOS[2][3], and passing `-ldflags -s` to `go build` avoided the issue. Go 1.23.1 worked out of the box, without fiddling with linker flags, so I don't think this is an issue- Go 1.21 is simply too old to support macOS 15, but I thought it was worth noting. I supposed we could gate the cgo code with an additional build flag, if anyone is concerned about this. [1] https://github.com/ebitengine/purego [2] golang/go#19841 (comment) [3] golang/go#11887 (comment) Signed-off-by: Matt Harbison <[email protected]>
prometheus#1600) Unfortunately, these values aren't available from getrusage(2), or any other builtin Go API. Go itself doesn't provide a mechanism (like on Windows) to call into system libraries. Using a 3rd party package[1] to dynamically call system libraries was proposed and rejected, to avoid adding to the number of dependencies. That leaves using cgo, which is used here when available. When not available (either because of cross compiling or explicitly disabling it), a stub function is linked instead, and the metrics are not exported. That way, cross compiling of other platforms is unaffected (and can also still be done with Darwin too, but at the cost of not exporting these metrics). Note that building an amd64 image on an arm64 mac or vice-versa is cross compiling, and will use the stub method by default. This can be avoided by setting `CGO_ENABLED=1` in the environment to force the use of cgo for both architectures. I'm unsure of the usefulness of the potential adjustment made to the virtual memory value after calling `mach_vm_region()`. I've not seen that code get run with a native amd64 or arm64 image, or with an amd64 image running under Rosetta. But that's what the `ps(1)` command does, and I think we should report what the system tools do. When I was testing this on a beta of macOS 15 with Go 1.21.13 (the current minimum support for this module), the amd64 image ran fine under Rosetta, but the arm64 image immediately printed a message that it was killed, even prior to the cgo call. This seems to be a recurring issue on macOS[2][3], and passing `-ldflags -s` to `go build` avoided the issue. Go 1.23.1 worked out of the box, without fiddling with linker flags, so I don't think this is an issue- Go 1.21 is simply too old to support macOS 15, but I thought it was worth noting. I supposed we could gate the cgo code with an additional build flag, if anyone is concerned about this. [1] https://github.com/ebitengine/purego [2] golang/go#19841 (comment) [3] golang/go#11887 (comment) Signed-off-by: Matt Harbison <[email protected]>
prometheus#1600) Unfortunately, these values aren't available from getrusage(2), or any other builtin Go API. Go itself doesn't provide a mechanism (like on Windows) to call into system libraries. Using a 3rd party package[1] to dynamically call system libraries was proposed and rejected, to avoid adding to the number of dependencies. That leaves using cgo, which is used here when available. When not available (either because of cross compiling or explicitly disabling it), a stub function is linked instead, and the metrics are not exported. That way, cross compiling of other platforms is unaffected (and can also still be done with Darwin too, but at the cost of not exporting these metrics). Note that building an amd64 image on an arm64 mac or vice-versa is cross compiling, and will use the stub method by default. This can be avoided by setting `CGO_ENABLED=1` in the environment to force the use of cgo for both architectures. I'm unsure of the usefulness of the potential adjustment made to the virtual memory value after calling `mach_vm_region()`. I've not seen that code get run with a native amd64 or arm64 image, or with an amd64 image running under Rosetta. But that's what the `ps(1)` command does, and I think we should report what the system tools do. When I was testing this on a beta of macOS 15 with Go 1.21.13 (the current minimum support for this module), the amd64 image ran fine under Rosetta, but the arm64 image immediately printed a message that it was killed, even prior to the cgo call. This seems to be a recurring issue on macOS[2][3], and passing `-ldflags -s` to `go build` avoided the issue. Go 1.23.1 worked out of the box, without fiddling with linker flags, so I don't think this is an issue- Go 1.21 is simply too old to support macOS 15, but I thought it was worth noting. I supposed we could gate the cgo code with an additional build flag, if anyone is concerned about this. [1] https://github.com/ebitengine/purego [2] golang/go#19841 (comment) [3] golang/go#11887 (comment) Signed-off-by: Matt Harbison <[email protected]>
* process_collector: fill in most statistics on macOS Unfortunately, the virtual memory, resident memory, and network stats will require access to undocumented C functions. I was warned off of cgo in IRC because it would then have to be enabled in a bunch of different projects that use this module, but I already was against it because that would break the ability to cross-compile. There is no interface to `dlopen` built into golang. The `github.com/ebitengine/purego` module looks promising (I can cross-compile and call these methods), but I'm currently getting unexpected results. I'll follow up with that separately if I can get it working, but hopefully this stuff is pretty uncontroversial. Tested on macOS 10.14.6 (amd64), macOS 14.6.1 (amd64), and macOS 15.0 (arm64) by spawning `/usr/bin/ulimit -a -S` and `/usr/sbin/lsof -c $my_process` from the test exporter process, and `ps -o lstart,vsize,rss,utime,stime,command` from the shell, and comparing results with the exported metrics. I can't find documentation for `RLIMIT_AS` on macOS (specifically if it's in bytes or pages). It's currently being reported back as `RLIM_INFINITY`, which seems reasonable, because I've come across reports that the value is ignored anyway[1]. The bash 3.2 code for the built-in `ulimit` divides the value reported by `getrusage(2)` by 1024 when printing, as it does for `RLIMIT_DATA`, which is documented as being bytes in `getrusage(2)`. The help for `ulimit` indicates it prints both in kbytes, so it's reasonable to assume this is already in bytes. [1] https://issues.chromium.org/issues/40581251#comment3 Signed-off-by: Matt Harbison <[email protected]> * Update prometheus/process_collector_darwin.go Co-authored-by: Ben Kochie <[email protected]> Signed-off-by: Matt Harbison <[email protected]> --------- Signed-off-by: Matt Harbison <[email protected]> Signed-off-by: Matt Harbison <[email protected]> Co-authored-by: Ben Kochie <[email protected]> Co-authored-by: Bartlomiej Plotka <[email protected]> Signed-off-by: Eugene <[email protected]>
Unfortunately, the virtual memory, resident memory, and network stats will require access to undocumented C functions. I was warned off of cgo in IRC because it would then have to be enabled in a bunch of different projects that use this module, but I already was against it because that would break the ability to cross-compile. There is no interface to
dlopen
built into golang. Thegithub.com/ebitengine/purego
module looks promising (I can cross-compile and call these methods), but I'm currently getting unexpected results. I'll follow up with that separately if I can get it working, but hopefully this stuff is pretty uncontroversial.Tested on macOS 10.14.6 (amd64), macOS 14.6.1 (amd64), and macOS 15.0 (arm64) by spawning
/usr/bin/ulimit -a -S
and/usr/sbin/lsof -c $my_process
from the test exporter process, andps -o lstart,vsize,rss,utime,stime,command
from the shell, and comparing results with the exported metrics.I can't find documentation for
RLIMIT_AS
on macOS (specifically if it's in bytes or pages). It's currently being reported back asRLIM_INFINITY
, which seems reasonable, because I've come across reports that the value is ignored anyway[1]. The bash 3.2 code for the built-inulimit
divides the value reported bygetrusage(2)
by 1024 when printing, as it does forRLIMIT_DATA
, which is documented as being bytes ingetrusage(2)
. The help forulimit
indicates it prints both in kbytes, so it's reasonable to assume this is already in bytes.[1] https://issues.chromium.org/issues/40581251#comment3