diff --git a/content/post/postgres/rfc-extension-packaging-lookup.md b/content/post/postgres/rfc-extension-packaging-lookup.md new file mode 100644 index 00000000..a5b6db35 --- /dev/null +++ b/content/post/postgres/rfc-extension-packaging-lookup.md @@ -0,0 +1,497 @@ +--- +title: "RFC: Extension Packaging & Lookup" +slug: rfc-extension-packaging-lookup +date: 2024-11-04T19:07:44Z +lastMod: 2024-11-04T19:07:44Z +description: | + A proposal to modify the PostgreSQL core so that all files required for an + extension live in a directory named for the extension, along with a search + path to find extension directories. +tags: [Postgres, Extensions, RFC, Packaging, Kubernetes, Docker, Packaging, Postgres.app] +type: post +draft: true +--- + +A few weeks ago, I started [a pgsql-hackers thread] proposing a new extension +file organization and a search path [GUC] for finding extensions. The +[discussion] of [Christoph Berg]'s [`extension_destdir` patch][destdir] +inspired this proposal. These threads cover quite a lot of territory, so I +thought it would be useful to pull together a more unified, public proposal. + +## The Problem + +A number of challenges face extension users in various configurations, thanks +to extension file organization in the Postgres core. The common thread among +them is the need to add extensions without changing the Postgres installation +itself. + +### Packager Testing + +On Debian systems, the user account that creates extension packages lacks +permission to add files to Postgres install. But testing extensions requires +installing the extension files where Postgres can find them. Furthermore, +extensions should ideally be built against a clean Postgres install; adding an +extension in order to run `make installcheck` would pollute it. + +[Christoph's patch][destdir] solves these problems by adding a second lookup +path for extensions and dynamic modules, so that Postgres can load them +directly from the package build directory. + +Alas, the patch isn't ideal, because it simply specifies a prefix and appends +the full `pg_config` directory paths to it. For example, if `--sharedir` +outputs `/opt/share` and `extension_destdir` GUC is set to `/tmp/build/myext`, +the patch will search in `/tmp/build/myext/opt/share`. This approach works for +the packaging use case, which explicitly uses full paths with a prefix, but +would be weird for other use cases. + +### Docker Immutability + +Docker images are immutable. To install persistent extensions in a running +Docker container, one must create a persistent volume, map it to +`SHAREDIR/extensions`, and copy over all the core extensions (or muck with +[symlink magic]). Then do it again for shared object libraries (`PKGLIBDIR`), +and perhaps also for other `pg_config` directories, like `--bindir`. Once it's +all set up, one can install a new extension and its files will be distributed +to the relevant persistent volumes. + +This pattern makes upgrades tricky, because the core extensions are mixed in +with third-party extensions. Worse, the number of directories that must be +mounted into volumes depends on the features of an extension, increasing +deployment configuration complexity. It would be preferable to have all the +files for an extension in one place, rather than scattered across multiple +persistent volumes. + +### Postgres.app Immutability + +The macOS [Postgres.app] supports extensions. But installing one into +`SHAREDIR/extensions` changes the contents of the Postgres.app bundle, +breaking Apple-required signature validation. The OS will no longer be able to +validate that the app is legit and refuse to start it. + +## Solution + +To address these issues, this RFC proposes to change file organization and +lookup patterns for PostgreSQL extensions. + +### Extension Directories + +First, when an extension is installed, all of its files will live in a single +directory named for the extension. The contents include: + +* The Control file that describes extension +* Subdirectories for SQL, shared modules, docs, binaries + +Subdirectories roughly correspond to the `pg_config --*dir` options: + +* `bin`: Executables +* `doc`: Documentation files +* `html`: HTML documentation files +* `lib`: Dynamically loadable modules +* `locale`: Locale support files +* `man`: Manual pages +* `share`: SQL and other architecture-independent support files + +This layout reduces the cognitive overhead for understanding what files belong +to what extension. Want to know what's included in the `widget` extension? +Everything is in the `widget` directory. + +### Configuration Parameters + +Add three new `pg_config` values: + +``` + --extdir-core show location of core extensions + --extdir-vendor show location of vendor extensions + --extdir-site show location of site extensions +``` + +All core-distributed extensions, including contrib extensions, will be +installed in the `--extdir-core` directory, each in its own directory as +described [above](#extension-directories). Its contents would look something +like this: + +``` console +❯ ls -1 "$(pg_config --extdir-core)" +auto_explain +bloom +intagg +isn +plperl +plpgsql +xml2 +``` + +OS vendor and packaging systems would install non-core extensions into +`--extdir-vendor`, while user-installed extensions will be put into +`--extdir-site`. + +Like all other `pg_config` options, these values can be customized at compile +time. By default, they'll point to different directories, so that core, +vendor, and end-user extensions are always kept separate. Perhaps default to: + +``` +PG_INSTALL_ROOT/extensions/(core|site|vendor) +``` + +### Extension Path + +Add an extension lookup path GUC akin to [`dynamic_library_path`], called +`extension_path`. It lists all the directories that Postgres will search for +extensions and their files. The default value for this GUC will be: + +``` ini +extension_path = '$extdir_site,$extdir_vendor,$extdir_core' +``` + +The special values `$extdir_site`, `$extdir_vendor`, and `$extdir_core` +correspond to the `pg_config` options `--extdir-site`, `--extdir-vendor`, and +`--extdir-core`, respectively, and function exactly as `$libdir` does for the +`dynamic_library_path` GUC, substituting the appropriate values. + +### Lookup Execution + +Update PostgreSQL's `CREATE EXTENSION` command to search the directories in +`extension_path` for an extension. For each directory in the list, it will +should look for the extension control file in a directory named for the +extension: + +``` sh +$dir/$extension/$extension.control +``` + +The first match wil be considered the canonical location for the extension. +For example, if Postgres finds the control file for the `pair` at +`/opt/pg17/ext/pair/pair.control`, then it will load files only from the +appropriate subdirectories, e.g.: + +* SQL files from `/opt/pg17/ext/pair/share` +* Shared module files from `/opt/pg17/ext/pair/lib` + +### PGXS + +Update the extension installation behavior of [PGXS] to install extension +files into the new locations. A new variable, `EXTDIR`, will define the +directory into which to install an extension, and will default to +`--extdir-site`. It can be set to the values `$extdir_site`, `$extdir_vendor`, +or `$extdir_core`, or to any literal path. + +The `$EXTENSION` variable will be changed to allow only one extension name. If +it's set, the installation behavior will be changed for the following +variables: + +* `EXTENSION`: Creates `$EXTDIR/$EXTENSION`, installs + `$EXTDIR/$EXTENSION/$EXTENSION.control` +* `MODULES` and `MODULE_big`: Installed into `$EXTDIR/$EXTENSION/lib` +* `MODULEDIR`: Removed +* `DATA`: Installed into `$EXTDIR/$EXTENSION/share` +* `DATA_built`: Installed into `$EXTDIR/$EXTENSION/share` +* `DATA_TSEARCH`: Installed into `$EXTDIR/$EXTENSION/share/tsearch_data` +* `DOCS`: Installed into `$EXTDIR/$EXTENSION/doc` +* `PROGRAM`, `SCRIPTS` and `SCRIPTS_built`: Installed into + `$EXTDIR/$EXTENSION/bin` + +Another new variable, `LINKBINS`, will default to true and symlink +`$EXTDIR/$EXTENSION/bin` files in `pg_config --bindir`. Installers can set it +to false to skip the symlinking, e.g., for immutable Postgres installs. + +> [!NOTE] External projects that install extensions without using PGXS, like +> [pgrx], must also be updated to either follow the same pattern or to +> delegate installation to [PGXS]. + +### MODULE_PATHNAME + +Update the installer to replace `MODULE_PATHNAME` in SQL scripts with the new +install path for shared modules, `$EXTDIR/$EXTENSION/lib`. + +### Control File + +The `directory` and `module_pathname` control file variables will be +deprecated and ignored. + +## Use Cases + +Here’s how the proposed file layout and `extension_path` GUC address the [use +cases](#the-problem) that inspired this RFC. + +### Packager Testing + +A packager who wants to run tests without modifying a PostgreSQL install would +follow these steps: + +* Set the `extension_path` GUC to search the site extension directory under + the packaging install. Something like + `$RPM_BUILD_ROOT/$(pg_config --extdir-vendor)` +* Install the extension into that directory: + `make install EXTDIR=$RPM_BUILD_ROOT` +* Run `make installcheck` + +This will allow PostgreSQL to find and load the extension during the tests. +The Postgres installation will not have been modified; only the +`extension_path` will have changed. + +### Docker/Kubernetes + +To allow extensions to be added to a Docker container and to persist beyond +its lifetime, one or more [volumes] could be used. A couple of options: + +* Mount the `--extdir-site` and/or `--extdir-vendor` directories as a + persistent volumes (or one volume and a subdirectory for each). Then any + extensions installed into them will persist. Files for any one extension + will live on a single volume. If a new container spins up, as long as it + uses the same persistent volume(s), can access the same extensions. + +* Create separate images for each extension, and then "install" them by + using the [Kubernetes image volume feature] to mount them as read-only + volumes in the appropriate subdirectory of `--extdir-site` or + `--extdir-vendor`. Thereafter, any new containers would simply have to + mount all the same extension image volumes persistently provide the same + extensions to all containers. + +### Postgres.app + +To allow extension installation without invalidating the Postgres.app bundle's +signature, the app could be compiled to have `--extdir-site` and +`--extdir-vendor` point to subdirectories well-known directories outside the +app bundle, such as `/Library/Application Support/Postgres`. + +Any vendor or user extensions installed would be placed in those +subdirectories, without changing the contents of the Postgres.app bundle. +Postgres.app would know to find extensions in that location thanks to the +inclusion of `$extdir_site` in the `extension_path` GUC. + +## Extension Directory Examples + +A core extension, like [citext], would live in +`$(pg_config --extdir-core)/citext`, and have a structure such as: + +``` tree +citext +├── citext.control +├── lib +│ ├── citext.dylib +│ └── bitcode +│ ├── citext +│ │ └── citext.bc +│ └── citext.index.bc +└── share + ├── citext--1.0--1.1.sql + ├── citext--1.1--1.2.sql + ├── citext--1.2--1.3.sql + ├── citext--1.3--1.4.sql + ├── citext--1.4--1.5.sql + ├── citext--1.4.sql + └── citext--1.5--1.6.sql +``` + +The subdirectory for a pure SQL extension named "pair" in a directory named +“pair” that looks something like this: + +``` tree +pair +├── LICENSE.md +├── README.md +├── pair.control +├── doc +│ ├── html +│ │ └── pair.html +│ └── pair.md +└── share + ├── pair--1.0--1.1.sql + └── pair--1.1.sql +``` + +A binary application like [pg_top] would live in the `pg_top` directory, +structured something like: + +``` +pg_top +├── HISTORY.rst +├── INSTALL.rst +├── LICENSE +├── README.rst +├── bin +│ └── pg_top +└── doc + └── man + └── man3 + └── pg_top.3 +``` + +And a C extension like [semver] would live in the semver directory and be +structured something like: + +``` tree +semver +├── LICENSE +├── README.md +├── semver.control +├── doc +│ └── semver.md +├── lib +│ ├── semver.dylib +│ └── bitcode +│ ├── semver +│ │ └── semver.bc +│ └── semver.index.bc +└── share + ├── semver--1.0--1.1.sql + └── semver--1.1.sql +``` + +## Phase Two: Preloading + +The above-proposed [solution](#solution) does not allow shared modules +distributed with extensions to compatibly be loaded via [shared library +preloading], because extension modules wil no longer live in the +[`dynamic_library_path`]. Users can specify full paths, however. For example, +instead of: + +``` ini +shared_preload_libraries = 'pg_partman_bgw' +``` + +One could use: + +```ini +shared_preload_libraries = '/opt/postgres/extensions/pg_partman_bgw/lib/pg_partman_bgw' +``` + +But users will likely find this pattern cumbersome, especially for extensions +with multiple shared modules. Perhaps some special syntax could be added, for +example: + +```ini +shared_preload_libraries = '$extension_path:pg_partman_bgw' +``` + +But this overloads the semantics of `shared_preload_libraries` and friends +rather heavily, not to mention the [`LOAD`] command. + +Therefor, as a follow up to the [solution](#solution) proposed above, this RFC +proposes additional changes to PostgreSQL. + +### Extension Preloading + +Add new GUCs that complement [shared library preloading], but for *extension* +module preloading: + +* `shared_preload_extensions` +* `session_preload_extensions` +* `local_preload_extensions` + +Each takes a list of extensions for which to preload shared modules. In +addition, another new GUC, `local_plugins`, will contain a list of +administrator-approved extensions users are allowed to include in +`local_preload_extensions`. This GUC complements [`local_preload_libraries`]'s +use of a `plugins` directory. + +Then modify the preloading code to also preload these files. For each +extension in a list, it would: + +* Search each `$extension_path` for the extension. +* When found, load all the shared libraries in `$extension/lib`. + +For example, to load all shared modules in the `pg_partman` extension, set: + +```ini +shared_preload_extensions = 'pg_partman' +``` + +To load a single shared module from an extension, give its name after the +extension name and a slash. This example will load only the `pg_partman_bgw` +shared module from the `pg_partman` extension: + +```ini +shared_preload_extensions = 'pg_partman/pg_partman_bgw' +``` + +This change requires a one-time change to existing preload configurations on +upgrade. + +## Future: Deprecate LOAD + +For a future change, consider modifying `CREATE EXTENSION` to support shared +module-only extensions. This would allow extensions with no SQL component, +such as `auto_explain` to be handled like any other extension; it would live +in `--extdir-core` with a directory structure like this: + +``` tree +auto_explain +├── auto_explain.control +└── lib + ├── auto_explain.dylib + └── bitcode + ├── auto_explain + │ └── auto_explain.bc + └── auto_explain.index.bc +``` + +Note the `auto_explain.control` file. We would need to add a variable to +indicate that the extension includes no SQL files, so `CREATE EXTENSION` and +related commands wouldn't try to find them. + +With these changes, extensions could become the primary, recommended interface +for extending PostgreSQL. Perhaps the `LOAD` command could be deprecated, and +the `*_preload_libraries` GUCs along with it. + +## Compatibility Issues + +* The The `directory` and `module_pathname` control file variables and the + `MODULEDIR` PGXS variable would be deprecated and ignored. +* `*_preload_libraries` would no longer be used to find extension modules + without full paths. Administrators would have to remove module names from + these GUCs and add the relevant extension names to the + `*_preload_extensions` variables. To ease upgrades, we might consider + adding a PGXS variable that, when true, would symlink shared modules into + `--pkglibdr`. +* `LOAD` would no longer be able to find shared modules included with + extensions, unless we add a PGXS variable that, when true, would symlink + shared modules into `--pkglibdr`. +* The `EXTENSION` PGXS variable will no longer support multiple extension + names. +* The change in extension installation locations must also be adopted by + projects that don't use PGXS for installation, like [pgrx]. Or perhaps + they could be modified to also use PGXS. Long term it might be useful to + replace the `Makefile`-based PGXS with another installation system, + perhaps a CLI. + +## Out of Scope + +This RFC does not include or attempt to address the following issues: + +* How to manage third-party shared libraries. Making system dependencies + consistent in a Docker/Kubernetes environment or for non-system binary + packaging patterns presents its own challenges, though they're not + specific to PostgreSQL or the patterns described here. Research is ongoing + into potential solutions, and will be addressed elsewhere. + + [a pgsql-hackers thread]: https://postgr.es/m/2CAD6FA7-DC25-48FC-80F2-8F203DECAE6A%40justatheory.com + [GUC]: https://pgpedia.info/g/guc.html "GUC - Grand Unified Configuration" + [discussion]: https://postgr.es/m/E7C7BFFB-8857-48D4-A71F-88B359FADCFD@justatheory.com + [Christoph Berg]: https://www.df7cb.de + [destdir]: https://commitfest.postgresql.org/50/4913/ + [symlink magic]: https://speakerdeck.com/ongres/postgres-extensions-in-kubernetes?slide=14 + "Postgres Extensions in Kubernetes: StackGres" + [Kubernetes image volume feature]: https://kubernetes.io/docs/tasks/configure-pod-container/image-volumes/ + "Kubernetes Docs: Use an Image Volume With a Pod" + [Postgres.app]: https://postgresapp.com + "Postgres.app: The easiest way to get started with PostgreSQL on the Mac" + [PGXS]: https://www.postgresql.org/docs/current/extend-pgxs.html + "PostgreSQL Docs: Extension Building Infrastructure" + [`dynamic_library_path`]: https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-DYNAMIC-LIBRARY-PATH + [pgrx]: https://github.com/pgcentralfoundation/pgrx "pgrx: Build Postgres Extensions with Rust!" + [citext]: https://www.postgresql.org/docs/17/citext.html + "PostgreSQL Docs: citext — a case-insensitive character string type" + [pg_top]: https://pgxn.org/dist/pg_top/ "PGXN: pg_top" + [semver]: https://pgxn.org/dist/semver/ "PGXN: semver" + [volumes]: https://docs.docker.com/engine/storage/volumes/ + "Docker Docs: Volumes" + [shared library preloading]: https://www.postgresql.org/docs/current/runtime-config-client.html#RUNTIME-CONFIG-CLIENT-PRELOAD + "PostgreSQL Docs: Shared Library Preloading" + [`local_preload_libraries`]: https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-LOCAL-PRELOAD-LIBRARIES + "PostgreSQL Docs: local_preload_libraries" + [`LOAD`]: https://www.postgresql.org/docs/17/sql-load.html + "PostgreSQL Docs: LOAD" + [auto_explain]: https://www.postgresql.org/docs/current/auto-explain.html + "PostgreSQL Docs: auto_explain— log execution plans of slow queries" + \ No newline at end of file