Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft RFC for Postgres extension directory structure and search path #7

Closed
wants to merge 0 commits into from

Conversation

theory
Copy link
Owner

@theory theory commented Oct 30, 2024

No description provided.

@theory theory self-assigned this Oct 30, 2024
@theory theory force-pushed the extpath-rfc branch 2 times, most recently from fb122e5 to 1af71d7 Compare November 4, 2024 22:16
### Extension Directories

First, when an extension is installed, all of its files will live in a single
directory named for the extension. The contents include:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a new problem as of today, but I believe this RFC would be the place to solve it: how do we ensure extension names --and hence directory names-- are unique? TBH I don't have a good answer today, but I believe this should be discussed and addressed given the potential window of opportunity of changes in the extension system at the Postgres core.

Quick initial thoughts, probably not good ideas anyway: since there's no "central authority", possibly the only reasonable option is to use a reverse DNS name (e.g. com.vendor.ourextension) or, to make it easier, something like $extname-$fqdn. The create extension command could default to try to find by name only, and if there are no ambiguities, proceed without requiring the user to fully qualify it. The same would apply to the steps followed by Postgres to load the objects from the extensions.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of a DNS-based directory format; I went on at length about it back in February. I decided it could be addressed later, or at least outside this proposal, since it has implications for more SQL syntax to be able to load one foo extension vs. another. Probably C namespacing, too.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it could be part of another proposal, since it's slightly tangential. However, I'd consider then proposing both RFCs together, as any change in core should in my opinion address "all the relevant issues as of today", or else we will end up with a half-baked solution that won't be touched again in, possibly, years. Let's use the momentum we may have to propose changes to consider all that we may need.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor, but I think someone with much more experience with C extensions, compiler options, and whatnot needs to work on it. Any suggestions?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think many/most PG hackers may fit in here. Not sure about availability/interest. WDYT asking in the mailing list for volunteers?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL, I've generally had more luck working directly with an interested person or likely candidate than soliciting volunteers. Or just doing it myself, which I'd eventually be willing to do. But it will be some time before I have that bandwidth.

* The Control file that describes extension
* Subdirectories for SQL, shared modules, docs, binaries

Subdirectories roughly correspond to the `pg_config --*dir` options:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd avoid bringing any files into extension's directories that are not strictly required for Postgres to load and run that extension. That implies that documentation, man pages, etc, are strictly kept out of this directory. If desired, they can be placed in other system-level directories (e.g. /usr/share/doc). This also helps bringing into the directories where Postgres loads stuff the bare minimum amount of files required, which I consider a good thing. Similarly, I'd also keep independent executables out of this directory. Not only they are not to be loaded by Postgres; but they will end up on a path quite unusable for the users --which are the ones expected to run such binaries.

Copy link
Owner Author

@theory theory Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's useful to have a complete picture of what all is considered part of an extension, and that includes documentation, license, bill of materials, etc. In other words, it's made for humans and machines.

No reason why one couldn't build layers an OCI image that include such files, and use a layer (tag) without them if you want.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's useful to have a complete picture of what all is considered part of an extension, and that includes documentation, license, bill of materials, etc. In other words, it's made for humans and machines.

I agree... but that doesn't mean they all need to be in the same directory. For instance, Debian packages put extension files where they are expected by Postgres (/usr/lib/postgresql/$major/lib) and documentation in the usual docs folder (/usr/share/doc). I'd do the same.

No reason why one couldn't build layers an OCI image that include such files, and use a layer (tag) without them if you want.

That's certainly true. However, there's a very practical reason not to abuse layering: certain container implementations have a limit in the number of layers that can be used (e.g. Docker at 127).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a hard time imagining needing more than 3-4 layers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a hard time imagining needing more than 3-4 layers.

Just imagine you split extensions into two layers (one for the files that PG needs to load to work with the extension; and another one with the docs and other files). Actually you may take this further and also use different layers for each of the extension's dependencies (and that may make sense, as the dependency may not change in a while across extension's versions). But for the sake of the argument let's say 2-3 per extension.

Then if I load, say, 15 extensions, I'm already consuming 30-45 layers.

Add to this the Postgres layer(s) itself --which in many cases is built as several layers, not just one--. And then the "base OS layer", which is normally multiple layers too.

In total you could be at 50-80 layers easily. And while this is what you may think "the maximum under reasonable use cases" the problem is that people will always find a reason to go beyond this "reasonable maximum" and would hit the limit.

All in all, I'd favor "extreme layering", but this unfortunate limitation of some systems like Docker makes it unpractical.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then if I load, say, 15 extensions, I'm already consuming 30-45 layers.

Oh, you didn't mean per-container? Damn limitations.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's per image. Actually the image doesn't have any limit, it is a limit of some container runtime (like Docker) than when downloading an image with more than 127 layers will refuse to run that container image.

Which may happen if you pack in the image the base OS, Postgres and a few dozen extensions (with every extension having two or more layers).

Yes, you cannot imagine how much I hate this limitation. It's nuts how some limits where imposed by the original Docker designs. That even then made no sense (well, a bit, while aufs was used to construct the runtime image, but that time is long gone). There should be always limits (that's defensive programming) but some are arbitrarily and unnecessary low.

To allow extensions to be added to a Docker container and to persist beyond
its lifetime, one or more [volumes] could be used. A couple of options:

* Mount the `--extdir-site` and/or `--extdir-vendor` directories as

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach works for single node. For multiple nodes you need to repeat the operation for all nodes. While doable (e.g. this is what StackGres does) it's quite a bit of an effort, and may be mentioned here.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you use a Kubernetes manifest that defines the shared volume and all your pods use it, yes? I must be misunderstanding, though; what's the extra effort?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, volumes are not shared. If so, that would require the use of a distributed filesystem (e.g. NFS) which almost nobody does (for known reasons).

Most of the setups I know (and definitely ours in StackGres) have one volume per pod. Those volumes are different, and PGDATA is obvously replicated via streaming replication. So anything else you put on top (e.g. extension's content unpackage from elsewhere) needs to be done per volume (repeated).

Which also means that if, for example, a pod dies and it's re-created, the same operation needs to happen again.

And this also means that if a new extension is added or one is removed, the changes required need to be applied in all volumes again!

In StackGres we wrote a "cluster controller", a sidecar that implements all this logic.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That casts @gbartolini's desire to use the (currently-alpha) Kubernetes image volume feature in a new light. In theory, using that pattern, you just need to update your manifest to mount the image volumes for each extension.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

I think it still requires the extensions to be placed on a unique directory per extension (as per this RFC) because I presume you cannot mount images in overlapping paths.

I'm also afraid this mechanism works at startup, so changing the extensions would require a pod restart.

But it's still much simpler than the "cluster controller" (though this mechanism works now and the image volume one possibly only with newer Postgres versions and newer K8s versions).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna spend some time on Peter's patch and think about implications for the future. Perhaps we limit it to the control file, SQL files, and shared modules. It still complicates things a bit for extensions that also ship with scripts/CLIs/whatever, but maybe that's a problem for another time.

└── pair--1.1.sql
```

A binary application like [pg_top] would live in the `pg_top` directory,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistently with that I've said before, and despite understanding that PGXN does currently support not only "extensions" but also "packages", I disagree it's a good thing to distribute something like pg_top "as an extension". It is not, and IMO should not be put into the extensions directory.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? No reason not to encompass and build packaging designs for a whole array of Postgres ecosystem tooling, including not just extensions and dynamic modules, but also CLIs, web applications, backup utilities, etc. Why exclude them?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the PGXN Meta v2 RFC proposes that its contents denotes different types of extensions: CREATE EXTENSION extensions, shared modules, and apps.

Copy link

@ahachete ahachete Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no objection in packaging other tools "in a similar --or the same-- way" (e.g. as OCI images or whatever it takes). My objection is from considering them "like extensions" and living in the same directory (extension_path). I'd just place them anywhere else (e.g. on /usr/local/bin).

For the reasons I stated before, I'd not be happy placing a file (including here binaries and potentially their shared libraries) in a place where Postgres would be looking for files to load into Postgres memory.

Besides, the UX for end users may not be good if now they need to either add multiple new entries to their $PATH variables (as each binary would be under a different directory) or call these binaries with cumbersome paths as in /usr/lib/postgresql/$major/extensions/pg_top/bin/pg_top (vs simply pg_top if it is on a standard system binary path).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I have a comment in here about optionally symlinking bin files. But I need to think more about this.


This RFC does not include or attempt to address the following issues:

* How to manage third-party shared libraries. Making system dependencies

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of an elephant in the room, and I definitely believe this RFC would be the place to address it --or else it won't be addressed again until a new further change to PG core for extensions, which may be a very distant future.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a long discussion in this thread, but ISTM that the issue exists with or without the changes in this RFC. I'm also not sure it will require changes to core. So I think it makes sense to consider in a separate RFC.

Copy link

@ahachete ahachete Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the same vein of what I mentioned upthread, I believe that if we need to propose changes to core to support this (and other RFCs) about extensions, I think that the best strategy is to put all the cards on the table once we have a complete vision of everything that is required.

I believe anything that would require changes to core Postgres will face a lot of discussion and comments. If then the can of worms is opened, then we should bring in all the changes required at once. Not only to evaluate everything with a holistic vision; but also because that can of worms is not easy/frequently opened.

Therefore, nothing against bringing this to another RFC. But I'd vouch to group all RFCs that represent all the potential changes required and the completeness of vision before they are brought to the table.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too, just somebody has to do the research, work out a proposal, and write it down. But in the absence of that (and yes, I will ask around for contributions), I want to work on incremental change, so that the perfect isn't the enemy of the good, and because that's how things are often done in the project.

Hell, Peter E submitted a patch proposing just the extension path stuff and no reorganization at all. As long as it's a step in the right direction I'm all in favor.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough!

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, maybe we could borrow something from Conda?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically this bit:

Conda-build uses @loader_path on macOS and $ORIGIN on Linux because we install into a common root directory and can assume that other libraries are also installed into that root. The use of the variables allows you to build relocatable binaries that can be built on one system and sent everywhere.

Which I'll be looking into when I get to building binary packages.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, sure. What I take it from there is that dependencies can reside on a lib/ subfolder relative to the extension. And this idea is already part of the RFC.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this RFC's lib folder is for pkglibdir files, but yeah.

@ahachete
Copy link

@theory I have left quite a few comments after doing a review of the RFC. Some are minor potential adjustments, but some other are proposes for deeper changes. And last but not least a couple others are about areas that I believe should be addressed as part of the proposal, and they are hard topics.

The amount of comments from my side doesn't mean even for a second that I feel the RFC is not good or not appropriate. On the contrary: I really appreciate the effort of starting the threads in the mailing list, capturing the essence of those discussions and structuring it all into a document that can be reviewed here. I think it's a great first step towards a wider and deeper discussion about the topic. Thank you! 🙏

@theory
Copy link
Owner Author

theory commented Nov 11, 2024

Thank you for the thorough review, @ahachete. I made a number of changes in response to your comments, but have punted on the bigger issues. In particular, I think we should collectively discuss whether separate core/vendor/site directories are desirable (and if so, how one can set all three to a single destination to eliminate the distinction), but that the third-party dependency issue ought to remain separate — though I completely agree it's important to address.

@theory theory force-pushed the extpath-rfc branch 3 times, most recently from baf0fae to b776520 Compare November 18, 2024 18:58
@theory theory force-pushed the extpath-rfc branch 3 times, most recently from 52c90cf to 5228f3f Compare November 24, 2024 23:57
@theory theory changed the title Start drafting extdir RFC Draft RFC for Postgres extension directory structure and search path Nov 25, 2024
theory added a commit that referenced this pull request Nov 25, 2024
Lots of input and feedback on this post, thanks to #7. See also the
acknowledgements.
@theory theory closed this Nov 25, 2024
@theory theory deleted the extpath-rfc branch November 25, 2024 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants