Replies: 18 comments 16 replies
-
We have
(but, I bet I'm missing the finer nuances of the reproducable builds here again) |
Beta Was this translation helpful? Give feedback.
-
In other words, this all would make a whole lot more sense to me if there was a switch that decides where the buildtime gets set from (clock/macro/source_date_epoch), and then the clamping options relate to buildtime and not source_date_epoch. Also @ffesti mentioned elsewhere that maybe, instead of having all these crazy switches, we should just default to clamping the mtimes for example. |
Beta Was this translation helpful? Give feedback.
-
Forward-looking defaults aside... Do you agree with the idea that there should be a single macro to set the buildtime source (clock/macro/source_date_epoch), and then have a separate flag for clamping mtimes to buildtime, or am I again missing some finer detail here? I've nothing against reproducible builds, it's this tangle of haphazard switches that annoys me. Tagging in folks who I remember showing interest in reproducible builds in the past, feel free to include more if/when I missed someone: @JanZerebecki @Conan-Kudo @bmwiedemann @boklm |
Beta Was this translation helpful? Give feedback.
-
With our build system, passing constants into a build is much easier than passing in a variable |
Beta Was this translation helpful? Give feedback.
-
Sorry but I have no idea this means. What are these "constants" and how are they being passed to a build? (a %_buildtime macro could be set either via rpmbuild command line --define or a macro file such as ~/.rpmmacros) |
Beta Was this translation helpful? Give feedback.
-
We have open-build-service (OBS) that uses obs-worker to call obs-build to call rpmbuild.
and the constants here are |
Beta Was this translation helpful? Give feedback.
-
I think @pmatilai meant having %source_buildtime with constants of either My preferred way would be as few macros or settings, but that is not as backwards compatible. Only reproducible builds. Build host field in rpm files is always set to My understanding is that @mlschroe prefers to have $SOURCE_DATE_EPOCH_MTIME set the build time and if it is set rpm would ignore but pass through $SOURCE_DATE_EPOCH. That has the downside that it is conceptually more complicated. It also doesn't nest cleanly, while we could more easily ignore mtimes in archives in a nested way, say for an rpm containing a zip, where a build picked up $SOURCE_DATE_EPOCH_MTIME for the zip mtimes. If a build picks up $SOURCE_DATE_EPOCH_MTIME and puts it in the content, say a build time field in an elf executable, we have the same problem again. To avoid that problem we can outright prevent nesting by having rpm copy $SOURCE_DATE_EPOCH_MTIME to the build time and unset it. I'm not sure if this is a good idea. The downside is that any nested archives can not be fixed in the same way, if the non-incremented mtimes in those become a problem. @mlschroe What do you think? We should not use clamping mtimes, outright setting them has fewer possibilities for errors (like incorrect clock). Always setting the mtimes to build time like detailed above and having no setting for it, might work in practice without breaking anything. @pmatilai However are you ok to make such a change in a normal rpm release? Same question for the other changes? If only OpenSUSE is using those settings that would make at least removing those we stop using easy. Is anyone else using any? Meanwhile testing this in OpenSUSE is making progress, I hope I fixed the last rpm that has had no changelog. |
Beta Was this translation helpful? Give feedback.
-
I wont claim to have digested all that, but just a high level confirmation: what I really would like to see is a coherent (re)design for reproducible builds, where you basically just flick it on and be done with it, whereas the existing flags reflect the organic growth process over the years and still much of the assembly is left to the user. And, to me it's not a big deal if its incompatible to older releases because this is something you set at a buildsystem level rather than per-spec thing. Such a change could of course only go into a new major release and not for backporting. Edit: and yes, I was talking about a macro to configure the clock source, and the name of the clock source is of course constant. You'd only need to specify an explicit time if that's what you specifically configured. Or something like that. The NEW/MTIME thing I've yet to internalize at all. |
Beta Was this translation helpful? Give feedback.
-
One of the reasons for the knobs is that not all of these settings are fully useful for "reproducibility" and some of these harm traceability and debugging. For example, forcing the build host to Setting Everything we do around "reproducible builds" needs to be viewed with the lens of handling this balance. |
Beta Was this translation helpful? Give feedback.
-
Clamping the mtime to buildtime has its own negative consequence too, because it makes it harder to reason reproducibility and it invalidates reproducibility in practice because every build will be different due to a variable clamp rather than an immutable clamp. |
Beta Was this translation helpful? Give feedback.
-
As long as a constant buildtime value is provided, it will also result in constant (reproducible) mtime. |
Beta Was this translation helpful? Give feedback.
-
I agree that the current set of options is a bit ad-hoc. But I think it's completely unrealistic to achieve a single switch, because different build systems and ways of managing the distro create different tradeoffs, and different people want to strike the balance between useful metadata and ease-of-reproducibility. Even with the 4 options listed above, Fedora and OpenSUSE disagree on two. In Fedora, we had a discussion about setting As @Conan-Kudo mentioned above, we have to strip metadata anyway. At least
Please, no. As @Conan-Kudo wrote above, this has clear downsides, because it forces constant churn between rebuilds. But it also destroys useful metadata, for example timestamps in documentation that specifies when the file was last modified. I'm sure there's a million other examples. And I find the worry about incorrect clocks rather unconvincing. In the current era of signatures being checked all the time, a build system or a developer making releases with a completely wrong time would create huge problems. If you find a case like this, fix it at the source, instead of forcing a heavy-weight workaround.
I don't think this has been thought through. The second problem is that we have a timestamp that is used to clamp mtimes. If we introduce a second timestamp and use that for clamping, we have to redesign/fix/update everything to use the new timestamp. Once we have gone through that churn, we are in almost the same situation, except that the variable name is different (and we also have a two variables with a slightly different meaning). All that said, I think it's fine if a way to e.g. add new options how to clamp/set mtimes and whatnot. But please make those opt-in, so that different distros that have slightly different approaches can make their choices. |
Beta Was this translation helpful? Give feedback.
-
Wait, what? If those differ then the packages do differ, so its not actually bit-per-bit identical. Which is what I've assumed reproducability to mean. This just goes to point out how completely different expectations people have. No wonder having a meaningful discussion about reproducable packages always seems so hard 😄
...but okay if we start down the filtering road (I don't disagree, I just clearly don't know what everybody's asssumptions are), then we arrive at this old discussion that never really went anywhere: #2023 Having a written definition of what "reproducability" means would help driving towards that goal. People clearly have very, very different ideas about it. It's good to have this discussion, but as discussion is what this is, I'm moving this there. Once something concrete emerges, we can open ticket(s). |
Beta Was this translation helpful? Give feedback.
-
Over the last years I just used
Normalizing the hostname is not really necessary, because the replication build can set We could come up with a replacement normalizer for |
Beta Was this translation helpful? Give feedback.
-
I wrote a long piece about this here.
Whether we skip some fields when doing a comparison, or take an rpm and strip those fields, and then do the comparison, is just an implementation detail. In practice, users get rpms that are signed. Thus, the format that the users are interested in checking is by definition the signed rpm. (The other end is interesting too. We generally talk about reproducibility in the sense of starting from srpms. This view originates in the Debian world where the source deb is the only common denominator. Packagers do not have to use git, they do not even have to use a vcs, and people do non-version-controlled binNMUs. Thus, when talking about the whole distro, starting from source debs is the only option. When working with rpms, at the technical level, getting the part from srpm until the binary rpm reproducible is challenging, so it makes sense for us to work on this part in the beginning. But what we actually want in the end is reproducibility of the full pipeline, i.e. starting from dist-git. I assume that adding the additional step where we generate the srpm from dist-git will be easy. And in dist-git, we want to have the upstream pristine tarballs, including a signature. In the end, ideally the user would be able to verify that the signed upstream tarball + a specific commit with our spec file leads to the rpms that they download from the mirror, reproducibly.)
I saw "reproducability" mentioned a few times. I assume it's not a typo, but I have no idea how it's supposed to be different from "reproducibility". Please see the link above for my definition of "reproducibility". |
Beta Was this translation helpful? Give feedback.
-
Oh BTW, just a quick side-remark on this:
OPTFLAGS shouldn't be even defined on noarch builds, much less included in the header. The former is hard to fix for various hysterical reasons, but the latter should be easy. |
Beta Was this translation helpful? Give feedback.
-
keszybz wrote:
I don't think it is a good idea to exclude metadata. One benefit that you can only get with bit-identical reproducibility is that you can list the one and only correct hash value of the build result. (that also works with signed rpms + delsign). |
Beta Was this translation helpful? Give feedback.
-
I think this all has drifted away from the initial proposal. The goal was to be able to improve reproducibility of a given rpm by:
Disregarding the implementation details, do you all think this is worthwhile to have? |
Beta Was this translation helpful? Give feedback.
-
Here's some thoughts about improving reproducible builds with rpm. The goal (for me) is to be able to reproduce a rpm given the source rpm.
We currently have the following switches:
%source_date_epoch_from_changelog
%use_source_date_epoch_as_buildtime
%clamp_mtime_to_source_date_epoch
This is centered around the sources and not really what I have in mind. I would prefer to keep setting the build time to the current time for normal builds and clamp the mtime to the build time. To reproduce a build, I would need a way to set the build time (and the build host) to the values from the rpm I want to reproduce.
So, would a
%clamp_mtime_to_buildtime
macro make sense? Should the getBuildTime() function check if a%_buildtime
macro is set?(This is somewhat related to pull request #2880. Basically $NEW_SOURCE_DATE_EPOCH is the build time.)
Thoughts?
Beta Was this translation helpful? Give feedback.
All reactions