Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create resource monitoring utility, upload a log of resource usage over the course of a CI run #265

Open
staticfloat opened this issue Dec 6, 2022 · 1 comment

Comments

@staticfloat
Copy link
Member

We should create a simple utility that tracks resource usage (CPU, memory, disk, etc...) and uploads a graph of the usage over time over the course of the CI run. Even cooler would be if we can distill it down to a self-contained HTML page with interactive plots a la Plotly.js or similar, but I'll take static .png's at first if that's easiest.

The rationale behind this being that we should get definitive proof of our resource usage over time and make it trivially easy for any contributor to inspect for themselves whether a PR failed due to memory exhaustion, disk space exhaustion, etc...

This will be most useful for dedicated VM machines (such as the windows KVM runners) but would also be useful (albeit perhaps less clear) for shared environments. If this plays well with cgroups, we can at least get semi-clear CPU usage charts, if not memory usage charts (as we don't segregate memory usage.... yet)

@staticfloat
Copy link
Member Author

staticfloat commented Dec 13, 2022

Update; I have the start of a package here: https://github.com/staticfloat/rabban

It's a self-contained rust executable that reads system statistics (CPU usage, memory usage, disk usage, etc...) and writes out a .csv file of those values over time until the process is killed. It's not in a completely usable state yet, (still some bugs) but it's getting there. The TODO list from here on out is:

  • Fix Windows disk path resolution bug (Paths need to be properly resolved according to the \\?\C:\... scheme)
  • Maybe add some simple self-tests?
    • Spin up a Julia process, consume some resources and verify that they show up properly?
  • Verify memory usage information is correct
  • Build a buildkite plugin that downloads and runs rabban in pre-command, then kills it and uploads the .csv in post-command.
  • Build a Julia script that uses Plotly to generate a self-contained, interactive webpage with the resources graphed in a beautiful webpage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant