Draft RFC for Postgres extension directory structure and search path #7

theory · 2024-10-30T22:53:38Z

No description provided.

content/post/postgres/rfc-extension-packaging-lookup.md

ahachete · 2024-11-10T01:47:19Z

content/post/postgres/rfc-extension-packaging-lookup.md

+### Extension Directories
+
+First, when an extension is installed, all of its files will live in a single
+directory named for the extension. The contents include:


This is not a new problem as of today, but I believe this RFC would be the place to solve it: how do we ensure extension names --and hence directory names-- are unique? TBH I don't have a good answer today, but I believe this should be discussed and addressed given the potential window of opportunity of changes in the extension system at the Postgres core.

Quick initial thoughts, probably not good ideas anyway: since there's no "central authority", possibly the only reasonable option is to use a reverse DNS name (e.g. com.vendor.ourextension) or, to make it easier, something like $extname-$fqdn. The create extension command could default to try to find by name only, and if there are no ambiguities, proceed without requiring the user to fully qualify it. The same would apply to the steps followed by Postgres to load the objects from the extensions.

I like the idea of a DNS-based directory format; I went on at length about it back in February. I decided it could be addressed later, or at least outside this proposal, since it has implications for more SQL syntax to be able to load one foo extension vs. another. Probably C namespacing, too.

I agree it could be part of another proposal, since it's slightly tangential. However, I'd consider then proposing both RFCs together, as any change in core should in my opinion address "all the relevant issues as of today", or else we will end up with a half-baked solution that won't be touched again in, possibly, years. Let's use the momentum we may have to propose changes to consider all that we may need.

I'm in favor, but I think someone with much more experience with C extensions, compiler options, and whatnot needs to work on it. Any suggestions?

I think many/most PG hackers may fit in here. Not sure about availability/interest. WDYT asking in the mailing list for volunteers?

LOL, I've generally had more luck working directly with an interested person or likely candidate than soliciting volunteers. Or just doing it myself, which I'd eventually be willing to do. But it will be some time before I have that bandwidth.

ahachete · 2024-11-10T01:48:41Z

content/post/postgres/rfc-extension-packaging-lookup.md

+*   The Control file that describes extension
+*   Subdirectories for SQL, shared modules, docs, binaries
+
+Subdirectories roughly correspond to the `pg_config --*dir` options:


I'd avoid bringing any files into extension's directories that are not strictly required for Postgres to load and run that extension. That implies that documentation, man pages, etc, are strictly kept out of this directory. If desired, they can be placed in other system-level directories (e.g. /usr/share/doc). This also helps bringing into the directories where Postgres loads stuff the bare minimum amount of files required, which I consider a good thing. Similarly, I'd also keep independent executables out of this directory. Not only they are not to be loaded by Postgres; but they will end up on a path quite unusable for the users --which are the ones expected to run such binaries.

I think it's useful to have a complete picture of what all is considered part of an extension, and that includes documentation, license, bill of materials, etc. In other words, it's made for humans and machines.

No reason why one couldn't build layers an OCI image that include such files, and use a layer (tag) without them if you want.

I think it's useful to have a complete picture of what all is considered part of an extension, and that includes documentation, license, bill of materials, etc. In other words, it's made for humans and machines.

I agree... but that doesn't mean they all need to be in the same directory. For instance, Debian packages put extension files where they are expected by Postgres (/usr/lib/postgresql/$major/lib) and documentation in the usual docs folder (/usr/share/doc). I'd do the same.

No reason why one couldn't build layers an OCI image that include such files, and use a layer (tag) without them if you want.

That's certainly true. However, there's a very practical reason not to abuse layering: certain container implementations have a limit in the number of layers that can be used (e.g. Docker at 127).

I have a hard time imagining needing more than 3-4 layers.

I have a hard time imagining needing more than 3-4 layers.

Just imagine you split extensions into two layers (one for the files that PG needs to load to work with the extension; and another one with the docs and other files). Actually you may take this further and also use different layers for each of the extension's dependencies (and that may make sense, as the dependency may not change in a while across extension's versions). But for the sake of the argument let's say 2-3 per extension.

Then if I load, say, 15 extensions, I'm already consuming 30-45 layers.

Add to this the Postgres layer(s) itself --which in many cases is built as several layers, not just one--. And then the "base OS layer", which is normally multiple layers too.

In total you could be at 50-80 layers easily. And while this is what you may think "the maximum under reasonable use cases" the problem is that people will always find a reason to go beyond this "reasonable maximum" and would hit the limit.

All in all, I'd favor "extreme layering", but this unfortunate limitation of some systems like Docker makes it unpractical.

Then if I load, say, 15 extensions, I'm already consuming 30-45 layers.

Oh, you didn't mean per-container? Damn limitations.

It's per image. Actually the image doesn't have any limit, it is a limit of some container runtime (like Docker) than when downloading an image with more than 127 layers will refuse to run that container image.

Which may happen if you pack in the image the base OS, Postgres and a few dozen extensions (with every extension having two or more layers).

Yes, you cannot imagine how much I hate this limitation. It's nuts how some limits where imposed by the original Docker designs. That even then made no sense (well, a bit, while aufs was used to construct the runtime image, but that time is long gone). There should be always limits (that's defensive programming) but some are arbitrarily and unnecessary low.

content/post/postgres/rfc-extension-packaging-lookup.md

ahachete · 2024-11-10T01:49:35Z

content/post/postgres/rfc-extension-packaging-lookup.md

+To allow extensions to be added to a Docker container and to persist beyond
+its lifetime, one or more [volumes] could be used. A couple of options:
+
+*   Mount the `--extdir-site` and/or `--extdir-vendor` directories as


This approach works for single node. For multiple nodes you need to repeat the operation for all nodes. While doable (e.g. this is what StackGres does) it's quite a bit of an effort, and may be mentioned here.

I assume you use a Kubernetes manifest that defines the shared volume and all your pods use it, yes? I must be misunderstanding, though; what's the extra effort?

No, volumes are not shared. If so, that would require the use of a distributed filesystem (e.g. NFS) which almost nobody does (for known reasons).

Most of the setups I know (and definitely ours in StackGres) have one volume per pod. Those volumes are different, and PGDATA is obvously replicated via streaming replication. So anything else you put on top (e.g. extension's content unpackage from elsewhere) needs to be done per volume (repeated).

Which also means that if, for example, a pod dies and it's re-created, the same operation needs to happen again.

And this also means that if a new extension is added or one is removed, the changes required need to be applied in all volumes again!

In StackGres we wrote a "cluster controller", a sidecar that implements all this logic.

I see. That casts @gbartolini's desire to use the (currently-alpha) Kubernetes image volume feature in a new light. In theory, using that pattern, you just need to update your manifest to mount the image volumes for each extension.

Yes.

I think it still requires the extensions to be placed on a unique directory per extension (as per this RFC) because I presume you cannot mount images in overlapping paths.

I'm also afraid this mechanism works at startup, so changing the extensions would require a pod restart.

But it's still much simpler than the "cluster controller" (though this mechanism works now and the image volume one possibly only with newer Postgres versions and newer K8s versions).

I'm gonna spend some time on Peter's patch and think about implications for the future. Perhaps we limit it to the control file, SQL files, and shared modules. It still complicates things a bit for extensions that also ship with scripts/CLIs/whatever, but maybe that's a problem for another time.

content/post/postgres/rfc-extension-packaging-lookup.md

ahachete · 2024-11-10T01:50:46Z

content/post/postgres/rfc-extension-packaging-lookup.md

+    └── pair--1.1.sql
+```
+
+A binary application like [pg_top] would live in the `pg_top` directory,


Consistently with that I've said before, and despite understanding that PGXN does currently support not only "extensions" but also "packages", I disagree it's a good thing to distribute something like pg_top "as an extension". It is not, and IMO should not be put into the extensions directory.

Why not? No reason not to encompass and build packaging designs for a whole array of Postgres ecosystem tooling, including not just extensions and dynamic modules, but also CLIs, web applications, backup utilities, etc. Why exclude them?

FWIW, the PGXN Meta v2 RFC proposes that its contents denotes different types of extensions: CREATE EXTENSION extensions, shared modules, and apps.

I have no objection in packaging other tools "in a similar --or the same-- way" (e.g. as OCI images or whatever it takes). My objection is from considering them "like extensions" and living in the same directory (extension_path). I'd just place them anywhere else (e.g. on /usr/local/bin).

For the reasons I stated before, I'd not be happy placing a file (including here binaries and potentially their shared libraries) in a place where Postgres would be looking for files to load into Postgres memory.

Besides, the UX for end users may not be good if now they need to either add multiple new entries to their $PATH variables (as each binary would be under a different directory) or call these binaries with cumbersome paths as in /usr/lib/postgresql/$major/extensions/pg_top/bin/pg_top (vs simply pg_top if it is on a standard system binary path).

Yeah, I have a comment in here about optionally symlinking bin files. But I need to think more about this.

content/post/postgres/rfc-extension-packaging-lookup.md

ahachete · 2024-11-10T01:52:04Z

content/post/postgres/rfc-extension-packaging-lookup.md

+
+This RFC does not include or attempt to address the following issues:
+
+*   How to manage third-party shared libraries. Making system dependencies


This is kind of an elephant in the room, and I definitely believe this RFC would be the place to address it --or else it won't be addressed again until a new further change to PG core for extensions, which may be a very distant future.

There was a long discussion in this thread, but ISTM that the issue exists with or without the changes in this RFC. I'm also not sure it will require changes to core. So I think it makes sense to consider in a separate RFC.

In the same vein of what I mentioned upthread, I believe that if we need to propose changes to core to support this (and other RFCs) about extensions, I think that the best strategy is to put all the cards on the table once we have a complete vision of everything that is required.

I believe anything that would require changes to core Postgres will face a lot of discussion and comments. If then the can of worms is opened, then we should bring in all the changes required at once. Not only to evaluate everything with a holistic vision; but also because that can of worms is not easy/frequently opened.

Therefore, nothing against bringing this to another RFC. But I'd vouch to group all RFCs that represent all the potential changes required and the completeness of vision before they are brought to the table.

Me too, just somebody has to do the research, work out a proposal, and write it down. But in the absence of that (and yes, I will ask around for contributions), I want to work on incremental change, so that the perfect isn't the enemy of the good, and because that's how things are often done in the project.

Hell, Peter E submitted a patch proposing just the extension path stuff and no reorganization at all. As long as it's a step in the right direction I'm all in favor.

Fair enough!

That said, maybe we could borrow something from Conda?

Specifically this bit:

Conda-build uses @loader_path on macOS and $ORIGIN on Linux because we install into a common root directory and can assume that other libraries are also installed into that root. The use of the variables allows you to build relocatable binaries that can be built on one system and sent everywhere.

Which I'll be looking into when I get to building binary packages.

Maybe, sure. What I take it from there is that dependencies can reside on a lib/ subfolder relative to the extension. And this idea is already part of the RFC.

Well this RFC's lib folder is for pkglibdir files, but yeah.

ahachete · 2024-11-10T01:57:19Z

@theory I have left quite a few comments after doing a review of the RFC. Some are minor potential adjustments, but some other are proposes for deeper changes. And last but not least a couple others are about areas that I believe should be addressed as part of the proposal, and they are hard topics.

The amount of comments from my side doesn't mean even for a second that I feel the RFC is not good or not appropriate. On the contrary: I really appreciate the effort of starting the threads in the mailing list, capturing the essence of those discussions and structuring it all into a document that can be reviewed here. I think it's a great first step towards a wider and deeper discussion about the topic. Thank you! 🙏

theory · 2024-11-11T18:48:11Z

Thank you for the thorough review, @ahachete. I made a number of changes in response to your comments, but have punted on the bigger issues. In particular, I think we should collectively discuss whether separate core/vendor/site directories are desirable (and if so, how one can set all three to a single destination to eliminate the distinction), but that the third-party dependency issue ought to remain separate — though I completely agree it's important to address.

Lots of input and feedback on this post, thanks to #7. See also the acknowledgements.

theory self-assigned this Oct 30, 2024

theory force-pushed the extpath-rfc branch 2 times, most recently from fb122e5 to 1af71d7 Compare November 4, 2024 22:16

bonesmoses reviewed Nov 4, 2024

View reviewed changes

content/post/postgres/rfc-extension-packaging-lookup.md Outdated Show resolved Hide resolved

keithf4 suggested changes Nov 5, 2024

View reviewed changes

content/post/postgres/rfc-extension-packaging-lookup.md Outdated Show resolved Hide resolved

content/post/postgres/rfc-extension-packaging-lookup.md Outdated Show resolved Hide resolved

theory force-pushed the extpath-rfc branch from 1af71d7 to e94b15e Compare November 5, 2024 16:59

hunleyd reviewed Nov 5, 2024

View reviewed changes

content/post/postgres/rfc-extension-packaging-lookup.md Outdated Show resolved Hide resolved

theory force-pushed the extpath-rfc branch from e94b15e to ff4474d Compare November 5, 2024 17:49

ahachete reviewed Nov 10, 2024

View reviewed changes

content/post/postgres/rfc-extension-packaging-lookup.md Outdated Show resolved Hide resolved

ahachete reviewed Nov 10, 2024

View reviewed changes

content/post/postgres/rfc-extension-packaging-lookup.md Outdated Show resolved Hide resolved

ahachete reviewed Nov 10, 2024

View reviewed changes

content/post/postgres/rfc-extension-packaging-lookup.md Outdated Show resolved Hide resolved

ahachete reviewed Nov 10, 2024

View reviewed changes

content/post/postgres/rfc-extension-packaging-lookup.md Outdated Show resolved Hide resolved

ahachete reviewed Nov 10, 2024

View reviewed changes

content/post/postgres/rfc-extension-packaging-lookup.md Outdated Show resolved Hide resolved

ahachete reviewed Nov 10, 2024

View reviewed changes

content/post/postgres/rfc-extension-packaging-lookup.md Outdated Show resolved Hide resolved

ahachete reviewed Nov 10, 2024

View reviewed changes

theory force-pushed the extpath-rfc branch from 6a68dda to 0fedcd9 Compare November 14, 2024 23:25

theory force-pushed the main branch from 8f81f00 to e1930d8 Compare November 14, 2024 23:26

theory force-pushed the extpath-rfc branch 3 times, most recently from baf0fae to b776520 Compare November 18, 2024 18:58

theory force-pushed the extpath-rfc branch 3 times, most recently from 52c90cf to 5228f3f Compare November 24, 2024 23:57

theory changed the title ~~Start drafting extdir RFC~~ Draft RFC for Postgres extension directory structure and search path Nov 25, 2024

theory force-pushed the extpath-rfc branch from 5228f3f to c82bcbb Compare November 25, 2024 19:01

theory added a commit that referenced this pull request Nov 25, 2024

Add RFC post on extension packaging and lookup

0715350

Lots of input and feedback on this post, thanks to #7. See also the acknowledgements.

theory closed this Nov 25, 2024

theory force-pushed the extpath-rfc branch from c82bcbb to 0715350 Compare November 25, 2024 19:03

theory deleted the extpath-rfc branch November 25, 2024 19:04


		This RFC does not include or attempt to address the following issues:

		* How to manage third-party shared libraries. Making system dependencies

Draft RFC for Postgres extension directory structure and search path #7

Draft RFC for Postgres extension directory structure and search path #7

Conversation

theory commented Oct 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

theory Nov 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahachete Nov 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahachete Nov 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahachete commented Nov 10, 2024

theory commented Nov 11, 2024

theory Nov 11, 2024 •

edited

Loading

ahachete Nov 12, 2024 •

edited

Loading

ahachete Nov 12, 2024 •

edited

Loading