-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft RFC for Postgres extension directory structure and search path #7
Conversation
fb122e5
to
1af71d7
Compare
### Extension Directories | ||
|
||
First, when an extension is installed, all of its files will live in a single | ||
directory named for the extension. The contents include: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a new problem as of today, but I believe this RFC would be the place to solve it: how do we ensure extension names --and hence directory names-- are unique? TBH I don't have a good answer today, but I believe this should be discussed and addressed given the potential window of opportunity of changes in the extension system at the Postgres core.
Quick initial thoughts, probably not good ideas anyway: since there's no "central authority", possibly the only reasonable option is to use a reverse DNS name (e.g. com.vendor.ourextension
) or, to make it easier, something like $extname-$fqdn
. The create extension
command could default to try to find by name only, and if there are no ambiguities, proceed without requiring the user to fully qualify it. The same would apply to the steps followed by Postgres to load the objects from the extensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of a DNS-based directory format; I went on at length about it back in February. I decided it could be addressed later, or at least outside this proposal, since it has implications for more SQL syntax to be able to load one foo
extension vs. another. Probably C namespacing, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it could be part of another proposal, since it's slightly tangential. However, I'd consider then proposing both RFCs together, as any change in core should in my opinion address "all the relevant issues as of today", or else we will end up with a half-baked solution that won't be touched again in, possibly, years. Let's use the momentum we may have to propose changes to consider all that we may need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor, but I think someone with much more experience with C extensions, compiler options, and whatnot needs to work on it. Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think many/most PG hackers may fit in here. Not sure about availability/interest. WDYT asking in the mailing list for volunteers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL, I've generally had more luck working directly with an interested person or likely candidate than soliciting volunteers. Or just doing it myself, which I'd eventually be willing to do. But it will be some time before I have that bandwidth.
* The Control file that describes extension | ||
* Subdirectories for SQL, shared modules, docs, binaries | ||
|
||
Subdirectories roughly correspond to the `pg_config --*dir` options: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd avoid bringing any files into extension's directories that are not strictly required for Postgres to load and run that extension. That implies that documentation, man pages, etc, are strictly kept out of this directory. If desired, they can be placed in other system-level directories (e.g. /usr/share/doc
). This also helps bringing into the directories where Postgres loads stuff the bare minimum amount of files required, which I consider a good thing. Similarly, I'd also keep independent executables out of this directory. Not only they are not to be loaded by Postgres; but they will end up on a path quite unusable for the users --which are the ones expected to run such binaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's useful to have a complete picture of what all is considered part of an extension, and that includes documentation, license, bill of materials, etc. In other words, it's made for humans and machines.
No reason why one couldn't build layers an OCI image that include such files, and use a layer (tag) without them if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's useful to have a complete picture of what all is considered part of an extension, and that includes documentation, license, bill of materials, etc. In other words, it's made for humans and machines.
I agree... but that doesn't mean they all need to be in the same directory. For instance, Debian packages put extension files where they are expected by Postgres (/usr/lib/postgresql/$major/lib
) and documentation in the usual docs folder (/usr/share/doc
). I'd do the same.
No reason why one couldn't build layers an OCI image that include such files, and use a layer (tag) without them if you want.
That's certainly true. However, there's a very practical reason not to abuse layering: certain container implementations have a limit in the number of layers that can be used (e.g. Docker at 127).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a hard time imagining needing more than 3-4 layers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a hard time imagining needing more than 3-4 layers.
Just imagine you split extensions into two layers (one for the files that PG needs to load to work with the extension; and another one with the docs and other files). Actually you may take this further and also use different layers for each of the extension's dependencies (and that may make sense, as the dependency may not change in a while across extension's versions). But for the sake of the argument let's say 2-3 per extension.
Then if I load, say, 15 extensions, I'm already consuming 30-45 layers.
Add to this the Postgres layer(s) itself --which in many cases is built as several layers, not just one--. And then the "base OS layer", which is normally multiple layers too.
In total you could be at 50-80 layers easily. And while this is what you may think "the maximum under reasonable use cases" the problem is that people will always find a reason to go beyond this "reasonable maximum" and would hit the limit.
All in all, I'd favor "extreme layering", but this unfortunate limitation of some systems like Docker makes it unpractical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then if I load, say, 15 extensions, I'm already consuming 30-45 layers.
Oh, you didn't mean per-container? Damn limitations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's per image. Actually the image doesn't have any limit, it is a limit of some container runtime (like Docker) than when downloading an image with more than 127 layers will refuse to run that container image.
Which may happen if you pack in the image the base OS, Postgres and a few dozen extensions (with every extension having two or more layers).
Yes, you cannot imagine how much I hate this limitation. It's nuts how some limits where imposed by the original Docker designs. That even then made no sense (well, a bit, while aufs
was used to construct the runtime image, but that time is long gone). There should be always limits (that's defensive programming) but some are arbitrarily and unnecessary low.
To allow extensions to be added to a Docker container and to persist beyond | ||
its lifetime, one or more [volumes] could be used. A couple of options: | ||
|
||
* Mount the `--extdir-site` and/or `--extdir-vendor` directories as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach works for single node. For multiple nodes you need to repeat the operation for all nodes. While doable (e.g. this is what StackGres does) it's quite a bit of an effort, and may be mentioned here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you use a Kubernetes manifest that defines the shared volume and all your pods use it, yes? I must be misunderstanding, though; what's the extra effort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, volumes are not shared. If so, that would require the use of a distributed filesystem (e.g. NFS) which almost nobody does (for known reasons).
Most of the setups I know (and definitely ours in StackGres) have one volume per pod. Those volumes are different, and PGDATA is obvously replicated via streaming replication. So anything else you put on top (e.g. extension's content unpackage from elsewhere) needs to be done per volume (repeated).
Which also means that if, for example, a pod dies and it's re-created, the same operation needs to happen again.
And this also means that if a new extension is added or one is removed, the changes required need to be applied in all volumes again!
In StackGres we wrote a "cluster controller", a sidecar that implements all this logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. That casts @gbartolini's desire to use the (currently-alpha) Kubernetes image volume feature in a new light. In theory, using that pattern, you just need to update your manifest to mount the image volumes for each extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
I think it still requires the extensions to be placed on a unique directory per extension (as per this RFC) because I presume you cannot mount images in overlapping paths.
I'm also afraid this mechanism works at startup, so changing the extensions would require a pod restart.
But it's still much simpler than the "cluster controller" (though this mechanism works now and the image volume one possibly only with newer Postgres versions and newer K8s versions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm gonna spend some time on Peter's patch and think about implications for the future. Perhaps we limit it to the control file, SQL files, and shared modules. It still complicates things a bit for extensions that also ship with scripts/CLIs/whatever, but maybe that's a problem for another time.
└── pair--1.1.sql | ||
``` | ||
|
||
A binary application like [pg_top] would live in the `pg_top` directory, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consistently with that I've said before, and despite understanding that PGXN does currently support not only "extensions" but also "packages", I disagree it's a good thing to distribute something like pg_top
"as an extension". It is not, and IMO should not be put into the extensions directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? No reason not to encompass and build packaging designs for a whole array of Postgres ecosystem tooling, including not just extensions and dynamic modules, but also CLIs, web applications, backup utilities, etc. Why exclude them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the PGXN Meta v2 RFC proposes that its contents
denotes different types of extensions: CREATE EXTENSION
extensions, shared modules, and apps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no objection in packaging other tools "in a similar --or the same-- way" (e.g. as OCI images or whatever it takes). My objection is from considering them "like extensions" and living in the same directory (extension_path
). I'd just place them anywhere else (e.g. on /usr/local/bin
).
For the reasons I stated before, I'd not be happy placing a file (including here binaries and potentially their shared libraries) in a place where Postgres would be looking for files to load into Postgres memory.
Besides, the UX for end users may not be good if now they need to either add multiple new entries to their $PATH
variables (as each binary would be under a different directory) or call these binaries with cumbersome paths as in /usr/lib/postgresql/$major/extensions/pg_top/bin/pg_top
(vs simply pg_top
if it is on a standard system binary path).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I have a comment in here about optionally symlinking bin files. But I need to think more about this.
|
||
This RFC does not include or attempt to address the following issues: | ||
|
||
* How to manage third-party shared libraries. Making system dependencies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of an elephant in the room, and I definitely believe this RFC would be the place to address it --or else it won't be addressed again until a new further change to PG core for extensions, which may be a very distant future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a long discussion in this thread, but ISTM that the issue exists with or without the changes in this RFC. I'm also not sure it will require changes to core. So I think it makes sense to consider in a separate RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the same vein of what I mentioned upthread, I believe that if we need to propose changes to core to support this (and other RFCs) about extensions, I think that the best strategy is to put all the cards on the table once we have a complete vision of everything that is required.
I believe anything that would require changes to core Postgres will face a lot of discussion and comments. If then the can of worms is opened, then we should bring in all the changes required at once. Not only to evaluate everything with a holistic vision; but also because that can of worms is not easy/frequently opened.
Therefore, nothing against bringing this to another RFC. But I'd vouch to group all RFCs that represent all the potential changes required and the completeness of vision before they are brought to the table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me too, just somebody has to do the research, work out a proposal, and write it down. But in the absence of that (and yes, I will ask around for contributions), I want to work on incremental change, so that the perfect isn't the enemy of the good, and because that's how things are often done in the project.
Hell, Peter E submitted a patch proposing just the extension path stuff and no reorganization at all. As long as it's a step in the right direction I'm all in favor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, maybe we could borrow something from Conda?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically this bit:
Conda-build uses @loader_path on macOS and $ORIGIN on Linux because we install into a common root directory and can assume that other libraries are also installed into that root. The use of the variables allows you to build relocatable binaries that can be built on one system and sent everywhere.
Which I'll be looking into when I get to building binary packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, sure. What I take it from there is that dependencies can reside on a lib/
subfolder relative to the extension. And this idea is already part of the RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well this RFC's lib
folder is for pkglibdir
files, but yeah.
@theory I have left quite a few comments after doing a review of the RFC. Some are minor potential adjustments, but some other are proposes for deeper changes. And last but not least a couple others are about areas that I believe should be addressed as part of the proposal, and they are hard topics. The amount of comments from my side doesn't mean even for a second that I feel the RFC is not good or not appropriate. On the contrary: I really appreciate the effort of starting the threads in the mailing list, capturing the essence of those discussions and structuring it all into a document that can be reviewed here. I think it's a great first step towards a wider and deeper discussion about the topic. Thank you! 🙏 |
Thank you for the thorough review, @ahachete. I made a number of changes in response to your comments, but have punted on the bigger issues. In particular, I think we should collectively discuss whether separate core/vendor/site directories are desirable (and if so, how one can set all three to a single destination to eliminate the distinction), but that the third-party dependency issue ought to remain separate — though I completely agree it's important to address. |
baf0fae
to
b776520
Compare
52c90cf
to
5228f3f
Compare
Lots of input and feedback on this post, thanks to #7. See also the acknowledgements.
No description provided.