From 7d5f1f0056f7fa94a545ff760aa900839b59d212 Mon Sep 17 00:00:00 2001 From: David Li Date: Tue, 26 Mar 2024 17:23:26 -0400 Subject: [PATCH] docs: describe how drivers/driver manager relate (#1655) Fixes #1651. --- docs/mermaid.makefile | 8 +- docs/source/_static/css/custom.css | 11 ++ docs/source/faq.rst | 11 ++ docs/source/format/DriverAlias.mmd | 32 +++ docs/source/format/DriverAlias.mmd.svg | 19 ++ docs/source/format/DriverDirectLink.mmd | 29 +++ docs/source/format/DriverDirectLink.mmd.svg | 19 ++ docs/source/format/DriverManagerUse.mmd | 53 +++++ docs/source/format/DriverManagerUse.mmd.svg | 19 ++ docs/source/format/DriverTableLoad.mmd | 46 +++++ docs/source/format/DriverTableLoad.mmd.svg | 19 ++ docs/source/format/DriverTableUse.mmd | 45 +++++ docs/source/format/DriverTableUse.mmd.svg | 19 ++ docs/source/format/how_manager.rst | 207 ++++++++++++++++++++ docs/source/index.rst | 1 + 15 files changed, 537 insertions(+), 1 deletion(-) create mode 100644 docs/source/format/DriverAlias.mmd create mode 100644 docs/source/format/DriverAlias.mmd.svg create mode 100644 docs/source/format/DriverDirectLink.mmd create mode 100644 docs/source/format/DriverDirectLink.mmd.svg create mode 100644 docs/source/format/DriverManagerUse.mmd create mode 100644 docs/source/format/DriverManagerUse.mmd.svg create mode 100644 docs/source/format/DriverTableLoad.mmd create mode 100644 docs/source/format/DriverTableLoad.mmd.svg create mode 100644 docs/source/format/DriverTableUse.mmd create mode 100644 docs/source/format/DriverTableUse.mmd.svg create mode 100644 docs/source/format/how_manager.rst diff --git a/docs/mermaid.makefile b/docs/mermaid.makefile index 9b46cf35e1..cf967f5d7a 100644 --- a/docs/mermaid.makefile +++ b/docs/mermaid.makefile @@ -18,6 +18,9 @@ # Generate Mermaid diagrams statically. Sphinx has a mermaid # extension, but this causes issues with the page shifting during # load. +# First: npm install -g @mermaid-js/mermaid-cli +# (if you are using Conda, this will not be "global" but rather install to +# your Conda prefix) # Use as: make -f mermaid.makefile -j all MERMAID := $(shell find source/ -type f -name '*.mmd') @@ -27,7 +30,10 @@ define LICENSE endef %.mmd.svg : %.mmd - mmdc --input $< --output $@ +# XXX: mermaid doesn't properly handle comments in all layouts (the parser is +# written entirely from scratch each time, it looks like), so strip them +# manually + grep -E -v "^%" $< | mmdc --input - --output $@ # Prepend the license header mv $@ $@.tmp echo " SqliteStatementExecuteQuery diff --git a/docs/source/format/DriverAlias.mmd.svg b/docs/source/format/DriverAlias.mmd.svg new file mode 100644 index 0000000000..5c270b121d --- /dev/null +++ b/docs/source/format/DriverAlias.mmd.svg @@ -0,0 +1,19 @@ + +
Application
Driver
AdbcStatementExecuteQuery
AdbcStatementSetSqlQuery
...
SqliteStatementExecuteQuery
SqliteStatementSetSqlQuery
diff --git a/docs/source/format/DriverDirectLink.mmd b/docs/source/format/DriverDirectLink.mmd new file mode 100644 index 0000000000..7a9974fab0 --- /dev/null +++ b/docs/source/format/DriverDirectLink.mmd @@ -0,0 +1,29 @@ +%% Licensed to the Apache Software Foundation (ASF) under one +%% or more contributor license agreements. See the NOTICE file +%% distributed with this work for additional information +%% regarding copyright ownership. The ASF licenses this file +%% to you under the Apache License, Version 2.0 (the +%% "License"); you may not use this file except in compliance +%% with the License. You may obtain a copy of the License at +%% +%% http://www.apache.org/licenses/LICENSE-2.0 +%% +%% Unless required by applicable law or agreed to in writing, +%% software distributed under the License is distributed on an +%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +%% KIND, either express or implied. See the License for the +%% specific language governing permissions and limitations +%% under the License. + +block-beta + columns 3 + app["Application"]:3 + + space:3 + + driver["Driver"]:3 + AdbcStatementExecuteQuery + AdbcStatementSetSqlQuery + ... + + app --> AdbcStatementExecuteQuery diff --git a/docs/source/format/DriverDirectLink.mmd.svg b/docs/source/format/DriverDirectLink.mmd.svg new file mode 100644 index 0000000000..c5471a0229 --- /dev/null +++ b/docs/source/format/DriverDirectLink.mmd.svg @@ -0,0 +1,19 @@ + +
Application
Driver
AdbcStatementExecuteQuery
AdbcStatementSetSqlQuery
...
diff --git a/docs/source/format/DriverManagerUse.mmd b/docs/source/format/DriverManagerUse.mmd new file mode 100644 index 0000000000..c0abc7adc3 --- /dev/null +++ b/docs/source/format/DriverManagerUse.mmd @@ -0,0 +1,53 @@ +%% Licensed to the Apache Software Foundation (ASF) under one +%% or more contributor license agreements. See the NOTICE file +%% distributed with this work for additional information +%% regarding copyright ownership. The ASF licenses this file +%% to you under the Apache License, Version 2.0 (the +%% "License"); you may not use this file except in compliance +%% with the License. You may obtain a copy of the License at +%% +%% http://www.apache.org/licenses/LICENSE-2.0 +%% +%% Unless required by applicable law or agreed to in writing, +%% software distributed under the License is distributed on an +%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +%% KIND, either express or implied. See the License for the +%% specific language governing permissions and limitations +%% under the License. + +block-beta + columns 3 + app["Application"]:3 + + space:3 + + drivermanager["ADBC Driver Manager"]:3 + + space + dm_execute_query["AdbcStatementExecuteQuery"] + dm_ellipsis["..."] + + space:3 + + space + + AdbcDriver["struct AdbcDriver"]:2 + + space + + execute_query + ellipsis["..."] + + space:3 + + driver["Driver"]:3 + AdbcDriverInit + AdbcStatementExecuteQuery + ... + space + SqliteStatementExecuteQuery + ellipsis2["..."] + + app --> dm_execute_query + dm_execute_query --> execute_query + execute_query --> SqliteStatementExecuteQuery diff --git a/docs/source/format/DriverManagerUse.mmd.svg b/docs/source/format/DriverManagerUse.mmd.svg new file mode 100644 index 0000000000..c658cf2b18 --- /dev/null +++ b/docs/source/format/DriverManagerUse.mmd.svg @@ -0,0 +1,19 @@ + +
Application
ADBC Driver Manager
AdbcStatementExecuteQuery
...
struct AdbcDriver
execute_query
...
Driver
AdbcDriverInit
AdbcStatementExecuteQuery
...
SqliteStatementExecuteQuery
...
diff --git a/docs/source/format/DriverTableLoad.mmd b/docs/source/format/DriverTableLoad.mmd new file mode 100644 index 0000000000..44f2d4130b --- /dev/null +++ b/docs/source/format/DriverTableLoad.mmd @@ -0,0 +1,46 @@ +%% Licensed to the Apache Software Foundation (ASF) under one +%% or more contributor license agreements. See the NOTICE file +%% distributed with this work for additional information +%% regarding copyright ownership. The ASF licenses this file +%% to you under the Apache License, Version 2.0 (the +%% "License"); you may not use this file except in compliance +%% with the License. You may obtain a copy of the License at +%% +%% http://www.apache.org/licenses/LICENSE-2.0 +%% +%% Unless required by applicable law or agreed to in writing, +%% software distributed under the License is distributed on an +%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +%% KIND, either express or implied. See the License for the +%% specific language governing permissions and limitations +%% under the License. + +block-beta + columns 3 + app["Application"]:3 + + space:3 + + space + + AdbcDriver["struct AdbcDriver"]:2 + + space + + execute_query + ellipsis["..."] + + space:3 + + driver["Driver"]:3 + AdbcDriverInit + AdbcStatementExecuteQuery + ... + space + SqliteStatementExecuteQuery + ellipsis2["..."] + + app --> AdbcDriverInit + AdbcDriverInit --> AdbcDriver + SqliteStatementExecuteQuery --> execute_query + ellipsis2 --> ellipsis diff --git a/docs/source/format/DriverTableLoad.mmd.svg b/docs/source/format/DriverTableLoad.mmd.svg new file mode 100644 index 0000000000..2b762f68e9 --- /dev/null +++ b/docs/source/format/DriverTableLoad.mmd.svg @@ -0,0 +1,19 @@ + +
Application
struct AdbcDriver
execute_query
...
Driver
AdbcDriverInit
AdbcStatementExecuteQuery
...
SqliteStatementExecuteQuery
...
diff --git a/docs/source/format/DriverTableUse.mmd b/docs/source/format/DriverTableUse.mmd new file mode 100644 index 0000000000..84717b6ab3 --- /dev/null +++ b/docs/source/format/DriverTableUse.mmd @@ -0,0 +1,45 @@ +%% Licensed to the Apache Software Foundation (ASF) under one +%% or more contributor license agreements. See the NOTICE file +%% distributed with this work for additional information +%% regarding copyright ownership. The ASF licenses this file +%% to you under the Apache License, Version 2.0 (the +%% "License"); you may not use this file except in compliance +%% with the License. You may obtain a copy of the License at +%% +%% http://www.apache.org/licenses/LICENSE-2.0 +%% +%% Unless required by applicable law or agreed to in writing, +%% software distributed under the License is distributed on an +%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +%% KIND, either express or implied. See the License for the +%% specific language governing permissions and limitations +%% under the License. + +block-beta + columns 3 + app["Application"]:3 + + space:3 + + space + + AdbcDriver["struct AdbcDriver"]:2 + + space + + execute_query + ellipsis["..."] + + space:3 + + driver["Driver"]:3 + AdbcDriverInit + AdbcStatementExecuteQuery + ... + space + SqliteStatementExecuteQuery + ellipsis2["..."] + + app --> execute_query + + execute_query --> SqliteStatementExecuteQuery diff --git a/docs/source/format/DriverTableUse.mmd.svg b/docs/source/format/DriverTableUse.mmd.svg new file mode 100644 index 0000000000..6521dfe93b --- /dev/null +++ b/docs/source/format/DriverTableUse.mmd.svg @@ -0,0 +1,19 @@ + +
Application
struct AdbcDriver
execute_query
...
Driver
AdbcDriverInit
AdbcStatementExecuteQuery
...
SqliteStatementExecuteQuery
...
diff --git a/docs/source/format/how_manager.rst b/docs/source/format/how_manager.rst new file mode 100644 index 0000000000..2c02182522 --- /dev/null +++ b/docs/source/format/how_manager.rst @@ -0,0 +1,207 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at +.. +.. http://www.apache.org/licenses/LICENSE-2.0 +.. +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +================================================ +How Drivers and the Driver Manager Work Together +================================================ + +.. note:: This document focuses on drivers/applications that implement or + consume the C API definitions in adbc.h. That includes C/C++, + Python, and Ruby; and possibly C#, Go, and Rust (when implementing + or consuming drivers via FFI). + +When an application calls a function like +:cpp:func:`AdbcStatementExecuteQuery`, how does it "know" what function in +which driver to actually call? + +This can happen in a few ways. In the simplest case, the application links to +a single driver, and directly calls ADBC functions explicitly defined by the +driver: + +.. figure:: DriverDirectLink.mmd.svg + + In the simplest case, an application directly links to the driver and calls + ADBC functions. + +This doesn't work with multiple drivers, or applications that don't/can't link +directly to drivers (think dynamic loading, perhaps in a language like +Python). For this case, ADBC provides a table of function pointers +(:cpp:struct:`AdbcDriver`), and a way to request this table from a driver. +Then, the application proceeds in two steps. First, it dynamically loads a +driver and calls an entrypoint function to get the function table: + +.. figure:: DriverTableLoad.mmd.svg + + Now, the application asks the driver for a table of functions to call. + +Then, the application uses the driver by calling the functions in the table: + +.. figure:: DriverTableUse.mmd.svg + + The application uses the table to call driver functions. This approach + scales to multiple drivers. + +Dealing with the table, however, is messy. So the overall recommended +approach is to use the ADBC driver manager. This is a library that pretends +to be a single driver that can be linked to and used "like normal". +Internally, it loads the table of function pointers and tracks which +database/connection/statement objects need which "actual" driver, making it +easy to dynamically load drivers at runtime and use multiple drivers from the +same application: + +.. figure:: DriverManagerUse.mmd.svg + + The application uses driver manager to "feel like" it's just using a single + driver. The driver manager handles the details behind the scenes. + +In More Detail +============== + +The `adbc.h`_ header ties everything together. It is the abstract API +definition, akin to interface/trait/protocol definitions in other languages. +C being C, however, all it consists of is a bunch of function prototypes and +struct definitions without any implementation. + +.. _adbc.h: https://github.com/apache/arrow/blob/main/format/adbc.h + +A driver, at its core, is just a library that implements those function +prototypes in adbc.h. Those functions may be implemented in C, or they can be +implemented in a different language and exported through language-specific FFI +mechanisms. For example, the Go and C# implementations of ADBC can both +export drivers to consumers who expect the C API definitions. As long as the +definitions in adbc.h are implemented somehow, then the application is +generally none the wiser when it comes to what's actually underneath. + +How does an application call these functions, though? Here, there are several +options. + +Again, the simplest case is as follows: if (1) the application links directly +to the driver, and (2) the driver exposes the ADBC functions *under the same +name* as in adbc.h, then the application can just ``#include `` and +call ``AdbcStatementExecuteQuery(...)`` directly. Here, the application and +driver have a relationship no different than any other C library. + +.. figure:: DriverDirectLink.mmd.svg + + In the simplest case, an application directly links to the driver and calls + ADBC functions. When the application calls ``StatementExecuteQuery``, that + is directly provided by the driver that it links against. + +Unfortunately, this doesn't work as well in other scenarios. For example, if +an application wishes to use multiple ADBC drivers, this no longer works: both +drivers define the same functions (the ones in adbc.h), and when the +application links both of them, the linker has no way of telling which +driver's function is meant when the application calls an ADBC function. On +top of that, this violates the `One Definition Rule`_. + +In this case, the driver can provide driver-specific aliases that applications +can use, say ``PostgresqlStatementExecuteQuery`` or +``FlightSqlStatementExecuteQuery``. Then, the application can link both +drivers, ignore the ``Adbc…`` functions (and ignore the technical violation of +the One Definition Rule there), and use the aliases instead. + +.. figure:: DriverAlias.mmd.svg + + To get around the One Definition Rule, we can provide aliases of the ADBC + APIs instead. + +This is rather inconvenient for the application, though. Additionally, this +sort of defeats the point of using ADBC, since now the application has a +separate API for each driver, even if they're technically all clones of the +same API. And this doesn't solve the problem for applications that want to +load drivers dynamically. For example, a Python script would want to load the +driver at runtime. In that case, it would need to know which functions from +the driver correspond to which functions in the ADBC API definitions, without +having to hardcode this knowledge. + +ADBC anticipated this, and defined :cpp:struct:`AdbcDriver`. This is just a +table of function pointers with one entry per ADBC function. That way, an +application can dynamically load a driver and call an entrypoint function that +returns this table of function pointers. (It does have to hardcode or guess +the name of the entrypoint; the ADBC spec lists a set of names it can try, +based on the name of the driver library itself.) + +.. figure:: DriverTableLoad.mmd.svg + + The application first loads a table of function pointers from the driver. + +Then, it can use the driver by calling functions in that table: + +.. figure:: DriverTableUse.mmd.svg + + The application uses the table to call driver functions. This approach + scales to multiple drivers. + +Of course, calling all functions by jumping through a giant table of function +pointers is inconvenient. So ADBC provides the "driver manager", a library +that _pretends_ to be a simple driver and implements all the ADBC functions. +Internally, it loads drivers dynamically, requests the tables of function +pointers, and keeps track of which connections are using which drivers. The +application only needs to call the standard ADBC functions, just like in the +simplest case we started out with: + +.. figure:: DriverManagerUse.mmd.svg + + The application uses driver manager to "feel like" it's just using a single + driver. The driver manager handles the details behind the scenes. + +So to recap, a driver should implement these three things: + +#. An implementation of each ADBC function, +#. A thin wrapper around each implementation function that exports the ADBC + name for each function, and +#. An entrypoint function that returns a :cpp:struct:`AdbcDriver` table, + containing the functions from (1). + +Then, an application has these choices of ways to use a driver: + +- Link the driver directly and call ``Adbc…`` functions (only in the simplest + cases) using (2) above, +- Link the driver directly/dynamically, load the :cpp:struct:`AdbcDriver` + via (3) above, and call ADBC functions through function pointers (generally + not recommended), +- Link the ADBC driver manager, call ``Adbc…`` functions, and let the driver + manager deal with (3) above (what most applications will want to do). + +In other words, it's usually easiest to just always use the driver manager. +But the magic it pulls isn't required or all that complex. + +.. note:: You may ask: when we have :cpp:struct:`AdbcDriver`, why do we bother + defining both ``AdbcStatementExecuteQuery`` and + ``SqliteStatementExecuteQuery`` (i.e., why do both (1) and (2) + above)? Can't we just define the ``Adbc…`` version, and put it into + the function table when requested? + + Here, implementation constraints come in. At runtime, when the + driver looks up the address of (say) ``AdbcStatementExecuteQuery`` + to put it into the table, the dynamic linker will come into play to + figure out where this function is. Unfortunately, it will probably + find it *in the driver manager*. This is a problem, since then the + driver manager will end up in an infinite loop when it goes to call + the "driver's" version of the function! + + By having a seemingly redundant copy of the function, we can then + hide the "real implementation" from the dynamic linker and avoid + this behavior. + + The driver manager could try to solve this by loading the drivers + with ``RTLD_DEEPBIND``. This, however, is not portable, and causes + problems if we also want to use things like AddressSanitizer during + development. The driver could also build with flags like + ``-Bsymbolic-functions``. + +.. _One Definition Rule: https://en.cppreference.com/w/cpp/language/definition#One_Definition_Rule diff --git a/docs/source/index.rst b/docs/source/index.rst index 494ccfc711..e29d26c482 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -244,6 +244,7 @@ Why ADBC? format/specification format/versioning format/comparison + format/how_manager .. toctree:: :maxdepth: 1