-
Notifications
You must be signed in to change notification settings - Fork 843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Component-based builds #6356
Comments
@theobat, I can't add much to a discussion about architecture. My own concerns are simple ones: (a) don't break anything for Stack users; (b) don't make Stack slower for 'everday' use; and (c) keep the code base 'tidy'. Currently, Stack builds a project using the version of Cabal (the library) that ships with the specified version of GHC - specifically |
Right, that makes sense. I'll ensure we keep the current existing behavior for builds with older cabals, it should only be a small number of them. Things before cabal 2.2 will be incompatible with this new way of building packages if I recall correctly, but again I'll keep the backward compatibility as a mandatory aspect. |
GHC 8.4.1 (released 8 March 2018) comes with EDIT: The 2022 State of Haskell Survey (during November 2022) yielded: "Which versions of GHC do you use?" (Optional. Multi select.)
Also: "Where do you use Haskell?" (Optional. Multi select.)
|
So this turns out to be more complex than I thought,n because the entire cache system is geared toward packages. For the sake of limited changes and swiftness, I'm only working on refactoring the inner component builds of an entire package for now, without moving all the bits towards the component architecture. That is, I'm only moving the |
It'd be incredible if this feature fixed #2800 |
@wraithm it would, and I have had a functional branch with this feature in the past month or so, but the issue is that this architectural change brings a significant perf regression in "normal/traditional" builds because, for each component within a package where you have an internal dependency (e.g. exe depends on lib or lib depends on sub-lib), component based builds means we call the cabal process N times where N is the number of distinct sequential (we can't parallelize them) components. Calling the cabal process is far from negligible, on my machine it incurred a I'm not sure what to do with that, my initial plan was to only trigger the component based builds for backpack builds (which I've been very close to finalize, and we have no choice as far as backpack is concerned), but I've had too much work in the past few weeks to discuss this issue any further, maybe @mpilgrem you can dive in on this. |
@theobat, thanks for all your work on this and the update. If I understand correctly, it appears that the following are not mutually compatible and 'something has to give': O1. Stack making use of Cabal (the library) through a compiled Would I be correct to assume that Cabal (the tool) avoids the problem by not making use of Cabal (the library) through a compiled The spotlight may be on O1. Why does Stack do that? A few things occur to me:
|
That's mostly right @mpilgrem, I don't really know why stack defers to a sub process called during stack's execution. Maybe it was easier back then ? And also it means you can use any cabal library you want... I'm also not entirely sure what cabal the executable does since I havn't looked at it in depth, but my impression was that the "Simple" build was just using the cabal library in the same haskell process, which is indeed what you're describing : it's not doing O1. I don't know if that's a possibility for stack though... But it'd be a significant speedup, and it'd significantly fade out the perf difference between package builds and component builds. Also note that, there are significant prospects for getting speed boosts in certain scenarios by using component based builds even compared to the package based builds, but that'd require : building only the components we want (as opposed to all the components of a package, but component by component, modulo tests and benchmarks specifics), building the unrelated components in parallel (as opposed to building only packages in parallel). All these things are yet another stack ( sic) of work, and it's not a solution to the problem at hand, that is : building component by component increases the number of subprocess we need to create to call the Setup.hs file/binary, and these sub-processes are expensive. |
This 9 Feb 2015 article by Michael Snoyman is referred to in the 6 July 2015 article I mention above. To put them in their historical context, Stack 0.0.1 was released on 9 June 2015. I am wondering if his experience is the origin of the 'reproducibiltiy' explanation I had read for 'O1'. |
A thought experiment: imagine a package that has |
It sounds to me like the answer to that question is: yes, there is a problem with some strict definition of reproducibility, eg. Cabal can interpret fields in the package differently across versions, etc. However, maybe there's a deeper question of, "is this a real problem?" I imagine that if we just bundled a single version of the cabal library, 99.99% of things would just work most of the time. I'm sure there are some pathological examples you could come up with. It might be interesting to fully understand what caused those major and minor version changes in I could imagine just calling You could also conceivably imagine bundling multiple different versions of the Cabal library and calling those different library functions based on the compiler version or what have you. However, I imagine that's way too much complexity for Here's maybe another question: What does IMHO, compilation speed is way way more important than stack handling all possible reproducibility cases. You can still handle reproducibility issues by just using different versions of Just curious, @theobat, where does the constraint that you need to do N invocations of cabal per component come from? Is this a fundamental limitation in the |
Yes, that would be nice, but I think carefully removing the historical way of deferring to a subprocess is far from obvious, there's a LOT of code just to handle the ceremony of doing these calls correctly.
Yes, at least there should be a way to move the cursor of the current reproducibility/perf tradeoff.
@wraithm I didn't make myself very clear : we only need to call the setup, build, configure etc, once per component. That's simply a requirement of cabal's own setup.hs interface. In particular, this paragraph makes it very clear :
And the constant cost of calling these scripts is roughly the same no matter if it concerns a single component or a whole package. So we simply pay a |
I'll look into the history of Stack building (by default) with the version of Cabal (the library) that comes with the specified version of GHC as a boot library. I think that, in order to do so, Stack necessarily has to compile a separate 'Setup' executable for each GHC/Cabal combo. I am also aware that was how the original specification of Cabal - https://www.haskell.org/cabal/proposal/pkg-spec.pdf (page 3) - intended Cabal to be used. If there was any plan to move away from that, you would have be convinced that it did not break anything for users of GHC 8.4 onwards or adversely affect the reproducibility of builds. |
@wraithm @mpilgrem FYI, I found the cabal logic related to deferring to a subprocess or not : https://github.com/haskell/cabal/blob/master/cabal-install/src/Distribution/Client/SetupWrapper.hs#L401-L426. It seems indeed that they use an internal method for all the Simple builds (except for some special logging aspect), deferring to the Cabal library defaultMainArgs function within the same process. |
@theobat, many thanks for continuing giving this topic your attention. I'll have a look at what you've found. |
Package component level builds
What's the point, what is it ?
Stack use cabal "simple" (which is very close to Setup.hs commands except it's a binary) to actually build packages.
That means, for each package selected in the "Plan", it gathers all the info required by cabal simple and then call it.
Currently stack use cabal simple through package builds, that is, for each package it calls :
...etc
Component level builds is basically doing the same as before,
but all the cabal simple calls are targeted at a single component of a package instead
for instance :
For a case where we have an exe1 depending on a sublib1.
Note that in this case the intra-package dependency has to be handled by stack
whereas it's currently handled by cabal simple.
Doing this in stack land, woud probably resolve many issues with over-building stuff, but mostly, it's a hard requirement for making backpack work (backpack cannot work with current style builds). I believe it's enough incentive to adopt this new style. Besides, it'd also bring stack closer to the cabal-install CLI.
Some architecture refactoring
In current stack, we have many occurences of "Set NamedComponent" or "Map StackUnqualCompName XX".
Given the requirements for component based builds, we are going to use a lot more of those in a
even more distinct flavors than now, which I don't think will scale well.
We also have many occurences of Library or Executable (see Installed data type) constructors as well which again is
redundant to some extent.
What I propose is we replace all of these by a a few datatypes, a phantom type and a type family
which would encompass all use cases through the same constructors.
First, the core data structures :
And then the use case type family :
Now this way appear a bit complicated at first, but there are many benefits to this approach :
whereas it's kind of hard to scrap the code for all the Set NamedComponent/Map StackUnqualComName places.
Now let's look at a few examples to see how that would look like in practice :
Now what about Package dependencies, they have in cabal a set of main or sublibrary dependencies :
The source files are also mapped for ghci through a Map of Named Component :
The InstalledMap datatype which is providing installed things in the ghcPkg database would give :
Now you get it, the design would be more normalized and unified, for a small abstraction cost.
It's not strictly necessary to get the component based builds, but I'd say it would make it singnificantly easier.
The idea is to bring in this datatype and then to refactor slowly and step by step where it makes sense.
The actual task list for the component based builds
RFC @mpilgrem
Other issues relating to component-based builds
(EDIT by @mpilgrem) The issue/feature request of component-based builds has a long history at this repository. The following are related issues:
The text was updated successfully, but these errors were encountered: