Replies: 11 comments 11 replies
-
Hi Simon, To achieve build reproducibility, you must have the same versions of dependencies. In particular, the compiler, linker, and compressors, python if you write .pyc files, but also various helper programs that process documentation files, etc. So generally the only feasible approach is to record the full package set installed in the buildroot, and then use the exact same versions of those packages when rebuilding. You need the same build macros too, but those are generally set through packages that drop macro files. So for most macros the issue is solved by fixing the package set which you already need to do anyway. There might be other macros which might be set in a different way. For example, in Fedora we can now set macros in a side tag. But those need to be recorded somehow. The mechanism that is used to record the list of packages should generally store this information too. |
Beta Was this translation helpful? Give feedback.
-
I think you misunderstand. If you build for multiple OS versions. e.g. RHEL 7..9 then dependencies are different, so you can't explicitly as rpm works at the moment configure all the dependencies at the same time. This is done by adding additional macro processing to either be told the OS to build for .e.g. Oracle community MySQL rpms tend to use Yet to be reproducible you need to know which macros were provided by the user or at least are "not part of the base rpm/build setup" and therefore are "user configurable". Without that you may not be able to determine exactly the same "input parameters" to re-build a package in the same was as the original packager. Similarly if I want to patch this package with different/extra functionality I can not be sure that my build will represent a consistent change against the original sources, and thus the whole premise of repeatable builds falls apart. My question therefore was whether there are any plans to ensure that a built src rpm could be configured to include any rpmbuild time command line macros as metadata so that I can use that in theory to reproduce the build process faithfully. Reason for this coming up. I'm having problems reproducing the builds because it looks like the original build environment used by the original packagers may not be "pristine" to fix one specific build issue I had to build add some symlinks , e.g. https://github.com/sjmudd/mysql-rpm-builder/blob/main/config/prepare__centos.8__8.0.33.sh#L44-L50 to setup the OS in a way which the build process would complete without errors. That's clearly a very obscure example but it's real. So I'm looking at ways to be make it easier for the rpm packaging to be configured in such a way that issues such as this can be avoided and having a "suitable spec file" I really can repeatably build from scratch successfully. |
Beta Was this translation helpful? Give feedback.
-
Maybe I misunderstood what you mean by "reproducible". I meant exact reproducibility in the sense of https://reproducible-builds.org/. If you just want to "roughly recreate" a build, then what you say is applicable. I would suggest retitling the issue to something like "Record macros defined during creation in srpm" (or something that fits better what you need) in order to reduce confusion. |
Beta Was this translation helpful? Give feedback.
-
In the end as a first step I'd like to be able to reproduce the builds as close to what the original builder did. So I'd like that build to be reproducible in the way referenced in the URL.
"rpm builds" are a bit of a pain as often we may use multiple repos. The build environment does not explicitly say where the I have changed the issue title to make rpm builds more reproducible as I think that's what I want and I think that by solving the 2 points above this would help a lot. I'm also aware that if you run I think this would pretty much help solve my problem. My second desire, which is NOT relevant to this issue, is to then be able to modify the build config or sources or add patches so that the finally built packages provide extra features compared to the originally built packaging, yet if done correctly the result will be compatible with the original packages. You could think of this along the lines of building extra kernel modules in a separate rpm provided as part of the build process which can be installed and used on the upstream running base kernel. It's much easier to do this if you can reproduce the upstream build process first. This is what I'm doing with additional plugins for the MySQL server. |
Beta Was this translation helpful? Give feedback.
-
Also related to your comments about how rpm works with dependencies: I'm aware of rpm dependencies. I've been building rpm packages since RedHat 3.0.3 (that's in 1996). Also I'm not the owner of the rpms I want to rebuild so it's not a matter of me rebuilding my rpms it's also a matter of me figuring out how to rebuild others' rpms. I could certainly suggest the upstream packagers make changes to their packages but that's a longer term discussion and it requires them doing this explicitly. The nature of this issue I have created is have rpm do this for us automatically, so I don't have to figure out the details myself. |
Beta Was this translation helpful? Give feedback.
-
"correct thing to do" is a matter of opinion. An easier question is whether this is what always happens, and it's easy to answer: no, in some workflows people receive an srpm from somewhere and build that. See for example the (now obsolete) workflow for CentOS: RH would publish rebranded srpm and various projects would rebuild them, treating the srpm as the initial input. In general, the macros that are defined during srpm build and the macros that are defined during binary build can be completely different. See https://pagure.io/koji/issue/3878 for a slightly different approach to this problem. |
Beta Was this translation helpful? Give feedback.
-
ok, so it depends on your point of view, that's clear. For downstream "repackagers" there's clearly internal macro "mangling" and perhaps build environment differences specifically due to that, and while that's clearly a very important use case it's not the same as mine. It may be that some of this behaviour needs to be optional, but again if the current rpm build process does not allow you to complete a rebuild correctly or it requires a large amount of investigation to work out how to achieve the end goal of reproducibly building packages then to some extent I think it's fragile. |
Beta Was this translation helpful? Give feedback.
-
To your comment:
my answer is precisely that. When I try to rebuild the single package the process fails at the end of a 3-hour build run with an extremely obscure error message. Yet somehow it seems to work with the upstream packager as the rpms are built and shared publicly. The upstream packager's build environment is not public and I've seen that to make the build work in some cases some "munging" of the build environment is needed. That's very messy. It does not resolve the build for the 3 OS combinations I am trying to build the package on: CentOS 7..9 and I'm still trying to resolve that. However, I think you understand my point of view. The intent of me creating this issue was to bring it up. It seems you're aware of the problem space and have shared that others also experience similar issues. Is there anything that can be done now? What should happen next? I do not think I can do anything right now and clearly any changes would be a long term effort. Can any further progress be made? |
Beta Was this translation helpful? Give feedback.
-
Transferred to discussions as this isn't a specific item to implement/fix, but rather open-ended, well, discussion. Nothing wrong with that, but tickets need to be actionable. As in, such a discussion could lead to creation of a ticket or a few. |
Beta Was this translation helpful? Give feedback.
-
Just a quick note here, as @keszybz already noted, there's reproducability and there's reproducability, and amusingly enough the two are often in direct conflict. To make it possible to talk about these, lets call this variant traceability instead. One recent enhancement (in rpm >= 4.18) is that we now store the parsed spec in the src.rpm, so you can actually see how it was built. Because, increasingly specs consist of complex higher level macros that expand to who knows what, and without having the end-result to compare with, you'll haven't got the slightest clue whether your build will result in anything remotely resembling the "original". While that isn't exactly the same as your request of recording macro values at build, it kinda achieves that in a different manner. As for recording the packages in the build environment, that has similar problems but worse: we have no idea which of the installed packages are related to the build at all. Recording the installed set can be extremely useful info when troubleshooting etc, but it also inevitably records a whole lot of irrelevant "static" like kernel version of the day which is not supposed to be relevant for the average package in any way. And, the package NEVRs would shatter reproducability even if the actual product was bit-per-bit identical. |
Beta Was this translation helpful? Give feedback.
-
You should try to find out what too is used to build the build root for these packages and use that if you can. In both openSUSE and Debian a buildinfo file (though with different syntax) of the the environment for the build is produced. This specifies environment variables, packages installed into the build root, etc. That would also be a good place for the repos and command line arguments. Though Fedora, openSUSE and Debian do not use any package specific arguments and the repos are implicit. I think openSUSE lists the repo of origin for each packe installed into the build root. I think the buildinfo of openSUSE has hashes, but Debians has not. openSUSE its OBS also can produce a few other SBOM formats. It is intentionally not part of the package binary, so that the buildinfo can be different while still producing the same bit-by-bit binary. This needs to be done at a level where the build root is created: mock or koji for Fedora, obs-build or OBS for openSUSE. Note that building a build root may also have steps that are not part of a package. The buildinfo/SBOM is the used for a cryprographic signature over asserting that the build was reproducible with that environment. Examples: |
Beta Was this translation helpful? Give feedback.
-
I was looking at rebuilding some MySQL community rpms which are normally built by Oracle but doing this turns out to be surprisingly hard. The spec file, https://github.com/mysql/mysql-server/blob/trunk/packaging/rpm-oel/mysql.spec.in, uses a number of macros to defined various parts of the build process,
BuildRequires:
entries and so on depending on the OS being used. This works for RHEL 7..9 but of course should also work for CentOS 7..9 and other similar distros.However, to reproduce a build made by someone else you need to know the exact macro definitions when the rpms were built. Unless I'm mistaken if you build a package with
rpmbuild --define 'something 1' --define 'something_else 1' name.spec
then the actual command line arguments used to build the package are not explicitly recorded in the binary rpms but perhaps more importantly in the src.rpm which I think only contains the sources and the spec file used.If that's the case the lack of recording this information means that from a src rpm I may be unable to rebuild the binary rpms in the same way as the original packager. Is this assumption correct? If so would it make sense that the .src.rpm also included the command line defines (and anything else that might make sense) to simplify this task?
I also notice that building any software depends on the installed software on the host/container where the build process runs, yet this is also not "registered". For rpm systems it might be convenient to also record the installed rpm package list as that would also be useful for reproducing the build environment appropriately.
Outside of rpm itself is repo configuration which is OS dependent and it seems that RHEL/CentOS/OEL and the other RH-clones all do things slightly differently which makes rebuilds more complex. I guess that's outside of the scope of this issue.
Why would improving this be useful if the source is provided? Simply because I may want to patch the originally built rpms in a specific way yet be sure that the rest of the build and packaging process is as close to the original packaging as before.
Alternatively I may want to build a sub-module of the upstream packages which is compatible with the originally built packages and can be used without having to rebuild the whole upstream code again.
So far I've not seen a way to make this process simpler and think the suggestions above, to include more information on the build command line arguments (and maybe macro values) and the installed package list, would help the rebuild process.
Can something be done in this direction?
For context I created: https://github.com/sjmudd/mysql-rpm-builder/ which was an attempt to simplify / document the reproducible rebuild process and it has turned out to be harder than originally anticipated. It is still work in progress but maybe gives some context to where the question comes from.
Beta Was this translation helpful? Give feedback.
All reactions