-
Notifications
You must be signed in to change notification settings - Fork 9
Jenkinsfile redesign
Right now, benchmark CI for speed.yjit.org has a complex structure that isn't serving it well. Jenkinsfiles are top-level CI jobs, which SSH to the workers (necessary for baremetal benchmarking), which then run bash scripts to set up things like chruby, which then run Ruby scripts that do the real work (e.g. benchmarking, commit-and-upload and reporting.)
Assuming we keep using Jenkinsfiles...
- Top level has to be Jenkinsfiles, which are a Groovy-based DSL
- Next level has to be bash, either in separate scripts or inlined into the Jenkinsfile (separate is better) - bash is needed so we can run chruby and generally configure the Ruby env
- The following level can be Ruby (or whatever we want, but we want Ruby)
Date of design work: 16th Feb, 2023
At the moment it's all very hard to test because, for instance, a trivial change to several Jenkinsfiles wants to run 3-6 hours of benchmarks and often reinstall all Rubies.
The Jenkinsfiles also have a lot of cut-and-paste, so that the reporting steps can happen from the "full daily everything" script, but also from a "just do the reporting" script. That's inefficient and causes a lot of needless errors.
Right now the Jenkinsfiles are structured in a series of stages, where each stage SSHes into multiple workers and runs steps in parallel. That's not terrible, but we could do better. For instance, the ARM worker runs for far longer than the X86 worker, and right now Jenkins "locks" both of them for the full duration. Also, having a "commit the data" stage trips over its own feet a lot because it's trying to git pull, git commit and git push on both workers at the same time. That causes a lot of screwups where they both make a commit against current head, and then one of them can't push that commit.
It's possible to "double down" on Jenkins and solve this with more Jenkinsfile standard library usage. It would be possible to autogenerate more of the SSH blocks for instance, and the stages, and to put together the larger Jenkinsfiles from the same Jenkinsfile shared-lib calls as the smaller-job Jenkinsfiles. That would definitely be an improvement over the current situation.
I'm worried about the testability of Jenkinsfiles, though. They're hard to run outside of Jenkins, and hard to autogenerate on-demand inside Jenkins. It might make more sense to treat the bash scripts as a generated artefact and have Jenkins create them -- e.g. SSH into the worker with a list of stages to run and some settings, which writes out the bash script locally, and then run that bash script. Then testing becomes "write the bash script and do assertions on it, but don't run it" which can save a lot of benchmarking time. It would also be easier to structure the Jenkinsfiles as two independent parallel tasks, one on each worker, that doesn't hold the x86 lock nearly as long.
That will take most of the logic out of the Jenkinsfiles, which is good if they aren't easy to test (and so far they aren't.) It will also avoid getting deeper into Jenkinsfile shared libraries, which looks fiddly, and replace it with more bash and Ruby.
We really want to run through the logic without, say, doing an actual rebuild of Ruby or a 3-hour benchmarking run. We'd also like some additional flexibility about exactly what gets run. What about non-full benchmarking runs, such as single-iter? What about full vs daily reporting?
So: we'd like to doublecheck the logic while avoiding the really slow parts.
(Is generating bash necessary? Is part of the answer just more bash-lint on the bash files I have?)
If we generate bash, we can easily add 'dummy' steps -- e.g. don't benchmark, but print out the command line for what we would run. Though generating bash isn't the only way to do that. It's hard to do dry-run stuff effectively since of course dry-run benchmark runs probably won't create benchmark files, and you certainly don't want dummy benchmark files getting committed.
Without testability, we increase some of our existing problems. For instance: the full benchmark run would be much better as a series of steps, going through benchmarking, then commit-and-push, then reporting, to be done on both workers. Great! Except that now there's no easy way to test the smaller or later steps without a giant benchmarking run.
We probably need some kind of 'placebo' version of 'rebuild all Ruby' and 'run benchmarks' to do the testing. That would reduce the runtime to something workable and allow merging a lot of steps nicely.