From 7d5f1f0056f7fa94a545ff760aa900839b59d212 Mon Sep 17 00:00:00 2001
From: David Li
Date: Tue, 26 Mar 2024 17:23:26 -0400
Subject: [PATCH] docs: describe how drivers/driver manager relate (#1655)
Fixes #1651.
---
docs/mermaid.makefile | 8 +-
docs/source/_static/css/custom.css | 11 ++
docs/source/faq.rst | 11 ++
docs/source/format/DriverAlias.mmd | 32 +++
docs/source/format/DriverAlias.mmd.svg | 19 ++
docs/source/format/DriverDirectLink.mmd | 29 +++
docs/source/format/DriverDirectLink.mmd.svg | 19 ++
docs/source/format/DriverManagerUse.mmd | 53 +++++
docs/source/format/DriverManagerUse.mmd.svg | 19 ++
docs/source/format/DriverTableLoad.mmd | 46 +++++
docs/source/format/DriverTableLoad.mmd.svg | 19 ++
docs/source/format/DriverTableUse.mmd | 45 +++++
docs/source/format/DriverTableUse.mmd.svg | 19 ++
docs/source/format/how_manager.rst | 207 ++++++++++++++++++++
docs/source/index.rst | 1 +
15 files changed, 537 insertions(+), 1 deletion(-)
create mode 100644 docs/source/format/DriverAlias.mmd
create mode 100644 docs/source/format/DriverAlias.mmd.svg
create mode 100644 docs/source/format/DriverDirectLink.mmd
create mode 100644 docs/source/format/DriverDirectLink.mmd.svg
create mode 100644 docs/source/format/DriverManagerUse.mmd
create mode 100644 docs/source/format/DriverManagerUse.mmd.svg
create mode 100644 docs/source/format/DriverTableLoad.mmd
create mode 100644 docs/source/format/DriverTableLoad.mmd.svg
create mode 100644 docs/source/format/DriverTableUse.mmd
create mode 100644 docs/source/format/DriverTableUse.mmd.svg
create mode 100644 docs/source/format/how_manager.rst
diff --git a/docs/mermaid.makefile b/docs/mermaid.makefile
index 9b46cf35e1..cf967f5d7a 100644
--- a/docs/mermaid.makefile
+++ b/docs/mermaid.makefile
@@ -18,6 +18,9 @@
# Generate Mermaid diagrams statically. Sphinx has a mermaid
# extension, but this causes issues with the page shifting during
# load.
+# First: npm install -g @mermaid-js/mermaid-cli
+# (if you are using Conda, this will not be "global" but rather install to
+# your Conda prefix)
# Use as: make -f mermaid.makefile -j all
MERMAID := $(shell find source/ -type f -name '*.mmd')
@@ -27,7 +30,10 @@ define LICENSE
endef
%.mmd.svg : %.mmd
- mmdc --input $< --output $@
+# XXX: mermaid doesn't properly handle comments in all layouts (the parser is
+# written entirely from scratch each time, it looks like), so strip them
+# manually
+ grep -E -v "^%" $< | mmdc --input - --output $@
# Prepend the license header
mv $@ $@.tmp
echo " SqliteStatementExecuteQuery
diff --git a/docs/source/format/DriverAlias.mmd.svg b/docs/source/format/DriverAlias.mmd.svg
new file mode 100644
index 0000000000..5c270b121d
--- /dev/null
+++ b/docs/source/format/DriverAlias.mmd.svg
@@ -0,0 +1,19 @@
+
+
diff --git a/docs/source/format/DriverDirectLink.mmd b/docs/source/format/DriverDirectLink.mmd
new file mode 100644
index 0000000000..7a9974fab0
--- /dev/null
+++ b/docs/source/format/DriverDirectLink.mmd
@@ -0,0 +1,29 @@
+%% Licensed to the Apache Software Foundation (ASF) under one
+%% or more contributor license agreements. See the NOTICE file
+%% distributed with this work for additional information
+%% regarding copyright ownership. The ASF licenses this file
+%% to you under the Apache License, Version 2.0 (the
+%% "License"); you may not use this file except in compliance
+%% with the License. You may obtain a copy of the License at
+%%
+%% http://www.apache.org/licenses/LICENSE-2.0
+%%
+%% Unless required by applicable law or agreed to in writing,
+%% software distributed under the License is distributed on an
+%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+%% KIND, either express or implied. See the License for the
+%% specific language governing permissions and limitations
+%% under the License.
+
+block-beta
+ columns 3
+ app["Application"]:3
+
+ space:3
+
+ driver["Driver"]:3
+ AdbcStatementExecuteQuery
+ AdbcStatementSetSqlQuery
+ ...
+
+ app --> AdbcStatementExecuteQuery
diff --git a/docs/source/format/DriverDirectLink.mmd.svg b/docs/source/format/DriverDirectLink.mmd.svg
new file mode 100644
index 0000000000..c5471a0229
--- /dev/null
+++ b/docs/source/format/DriverDirectLink.mmd.svg
@@ -0,0 +1,19 @@
+
+
diff --git a/docs/source/format/DriverManagerUse.mmd b/docs/source/format/DriverManagerUse.mmd
new file mode 100644
index 0000000000..c0abc7adc3
--- /dev/null
+++ b/docs/source/format/DriverManagerUse.mmd
@@ -0,0 +1,53 @@
+%% Licensed to the Apache Software Foundation (ASF) under one
+%% or more contributor license agreements. See the NOTICE file
+%% distributed with this work for additional information
+%% regarding copyright ownership. The ASF licenses this file
+%% to you under the Apache License, Version 2.0 (the
+%% "License"); you may not use this file except in compliance
+%% with the License. You may obtain a copy of the License at
+%%
+%% http://www.apache.org/licenses/LICENSE-2.0
+%%
+%% Unless required by applicable law or agreed to in writing,
+%% software distributed under the License is distributed on an
+%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+%% KIND, either express or implied. See the License for the
+%% specific language governing permissions and limitations
+%% under the License.
+
+block-beta
+ columns 3
+ app["Application"]:3
+
+ space:3
+
+ drivermanager["ADBC Driver Manager"]:3
+
+ space
+ dm_execute_query["AdbcStatementExecuteQuery"]
+ dm_ellipsis["..."]
+
+ space:3
+
+ space
+
+ AdbcDriver["struct AdbcDriver"]:2
+
+ space
+
+ execute_query
+ ellipsis["..."]
+
+ space:3
+
+ driver["Driver"]:3
+ AdbcDriverInit
+ AdbcStatementExecuteQuery
+ ...
+ space
+ SqliteStatementExecuteQuery
+ ellipsis2["..."]
+
+ app --> dm_execute_query
+ dm_execute_query --> execute_query
+ execute_query --> SqliteStatementExecuteQuery
diff --git a/docs/source/format/DriverManagerUse.mmd.svg b/docs/source/format/DriverManagerUse.mmd.svg
new file mode 100644
index 0000000000..c658cf2b18
--- /dev/null
+++ b/docs/source/format/DriverManagerUse.mmd.svg
@@ -0,0 +1,19 @@
+
+
diff --git a/docs/source/format/DriverTableLoad.mmd b/docs/source/format/DriverTableLoad.mmd
new file mode 100644
index 0000000000..44f2d4130b
--- /dev/null
+++ b/docs/source/format/DriverTableLoad.mmd
@@ -0,0 +1,46 @@
+%% Licensed to the Apache Software Foundation (ASF) under one
+%% or more contributor license agreements. See the NOTICE file
+%% distributed with this work for additional information
+%% regarding copyright ownership. The ASF licenses this file
+%% to you under the Apache License, Version 2.0 (the
+%% "License"); you may not use this file except in compliance
+%% with the License. You may obtain a copy of the License at
+%%
+%% http://www.apache.org/licenses/LICENSE-2.0
+%%
+%% Unless required by applicable law or agreed to in writing,
+%% software distributed under the License is distributed on an
+%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+%% KIND, either express or implied. See the License for the
+%% specific language governing permissions and limitations
+%% under the License.
+
+block-beta
+ columns 3
+ app["Application"]:3
+
+ space:3
+
+ space
+
+ AdbcDriver["struct AdbcDriver"]:2
+
+ space
+
+ execute_query
+ ellipsis["..."]
+
+ space:3
+
+ driver["Driver"]:3
+ AdbcDriverInit
+ AdbcStatementExecuteQuery
+ ...
+ space
+ SqliteStatementExecuteQuery
+ ellipsis2["..."]
+
+ app --> AdbcDriverInit
+ AdbcDriverInit --> AdbcDriver
+ SqliteStatementExecuteQuery --> execute_query
+ ellipsis2 --> ellipsis
diff --git a/docs/source/format/DriverTableLoad.mmd.svg b/docs/source/format/DriverTableLoad.mmd.svg
new file mode 100644
index 0000000000..2b762f68e9
--- /dev/null
+++ b/docs/source/format/DriverTableLoad.mmd.svg
@@ -0,0 +1,19 @@
+
+
diff --git a/docs/source/format/DriverTableUse.mmd b/docs/source/format/DriverTableUse.mmd
new file mode 100644
index 0000000000..84717b6ab3
--- /dev/null
+++ b/docs/source/format/DriverTableUse.mmd
@@ -0,0 +1,45 @@
+%% Licensed to the Apache Software Foundation (ASF) under one
+%% or more contributor license agreements. See the NOTICE file
+%% distributed with this work for additional information
+%% regarding copyright ownership. The ASF licenses this file
+%% to you under the Apache License, Version 2.0 (the
+%% "License"); you may not use this file except in compliance
+%% with the License. You may obtain a copy of the License at
+%%
+%% http://www.apache.org/licenses/LICENSE-2.0
+%%
+%% Unless required by applicable law or agreed to in writing,
+%% software distributed under the License is distributed on an
+%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+%% KIND, either express or implied. See the License for the
+%% specific language governing permissions and limitations
+%% under the License.
+
+block-beta
+ columns 3
+ app["Application"]:3
+
+ space:3
+
+ space
+
+ AdbcDriver["struct AdbcDriver"]:2
+
+ space
+
+ execute_query
+ ellipsis["..."]
+
+ space:3
+
+ driver["Driver"]:3
+ AdbcDriverInit
+ AdbcStatementExecuteQuery
+ ...
+ space
+ SqliteStatementExecuteQuery
+ ellipsis2["..."]
+
+ app --> execute_query
+
+ execute_query --> SqliteStatementExecuteQuery
diff --git a/docs/source/format/DriverTableUse.mmd.svg b/docs/source/format/DriverTableUse.mmd.svg
new file mode 100644
index 0000000000..6521dfe93b
--- /dev/null
+++ b/docs/source/format/DriverTableUse.mmd.svg
@@ -0,0 +1,19 @@
+
+
diff --git a/docs/source/format/how_manager.rst b/docs/source/format/how_manager.rst
new file mode 100644
index 0000000000..2c02182522
--- /dev/null
+++ b/docs/source/format/how_manager.rst
@@ -0,0 +1,207 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+..
+.. http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+================================================
+How Drivers and the Driver Manager Work Together
+================================================
+
+.. note:: This document focuses on drivers/applications that implement or
+ consume the C API definitions in adbc.h. That includes C/C++,
+ Python, and Ruby; and possibly C#, Go, and Rust (when implementing
+ or consuming drivers via FFI).
+
+When an application calls a function like
+:cpp:func:`AdbcStatementExecuteQuery`, how does it "know" what function in
+which driver to actually call?
+
+This can happen in a few ways. In the simplest case, the application links to
+a single driver, and directly calls ADBC functions explicitly defined by the
+driver:
+
+.. figure:: DriverDirectLink.mmd.svg
+
+ In the simplest case, an application directly links to the driver and calls
+ ADBC functions.
+
+This doesn't work with multiple drivers, or applications that don't/can't link
+directly to drivers (think dynamic loading, perhaps in a language like
+Python). For this case, ADBC provides a table of function pointers
+(:cpp:struct:`AdbcDriver`), and a way to request this table from a driver.
+Then, the application proceeds in two steps. First, it dynamically loads a
+driver and calls an entrypoint function to get the function table:
+
+.. figure:: DriverTableLoad.mmd.svg
+
+ Now, the application asks the driver for a table of functions to call.
+
+Then, the application uses the driver by calling the functions in the table:
+
+.. figure:: DriverTableUse.mmd.svg
+
+ The application uses the table to call driver functions. This approach
+ scales to multiple drivers.
+
+Dealing with the table, however, is messy. So the overall recommended
+approach is to use the ADBC driver manager. This is a library that pretends
+to be a single driver that can be linked to and used "like normal".
+Internally, it loads the table of function pointers and tracks which
+database/connection/statement objects need which "actual" driver, making it
+easy to dynamically load drivers at runtime and use multiple drivers from the
+same application:
+
+.. figure:: DriverManagerUse.mmd.svg
+
+ The application uses driver manager to "feel like" it's just using a single
+ driver. The driver manager handles the details behind the scenes.
+
+In More Detail
+==============
+
+The `adbc.h`_ header ties everything together. It is the abstract API
+definition, akin to interface/trait/protocol definitions in other languages.
+C being C, however, all it consists of is a bunch of function prototypes and
+struct definitions without any implementation.
+
+.. _adbc.h: https://github.com/apache/arrow/blob/main/format/adbc.h
+
+A driver, at its core, is just a library that implements those function
+prototypes in adbc.h. Those functions may be implemented in C, or they can be
+implemented in a different language and exported through language-specific FFI
+mechanisms. For example, the Go and C# implementations of ADBC can both
+export drivers to consumers who expect the C API definitions. As long as the
+definitions in adbc.h are implemented somehow, then the application is
+generally none the wiser when it comes to what's actually underneath.
+
+How does an application call these functions, though? Here, there are several
+options.
+
+Again, the simplest case is as follows: if (1) the application links directly
+to the driver, and (2) the driver exposes the ADBC functions *under the same
+name* as in adbc.h, then the application can just ``#include `` and
+call ``AdbcStatementExecuteQuery(...)`` directly. Here, the application and
+driver have a relationship no different than any other C library.
+
+.. figure:: DriverDirectLink.mmd.svg
+
+ In the simplest case, an application directly links to the driver and calls
+ ADBC functions. When the application calls ``StatementExecuteQuery``, that
+ is directly provided by the driver that it links against.
+
+Unfortunately, this doesn't work as well in other scenarios. For example, if
+an application wishes to use multiple ADBC drivers, this no longer works: both
+drivers define the same functions (the ones in adbc.h), and when the
+application links both of them, the linker has no way of telling which
+driver's function is meant when the application calls an ADBC function. On
+top of that, this violates the `One Definition Rule`_.
+
+In this case, the driver can provide driver-specific aliases that applications
+can use, say ``PostgresqlStatementExecuteQuery`` or
+``FlightSqlStatementExecuteQuery``. Then, the application can link both
+drivers, ignore the ``Adbc…`` functions (and ignore the technical violation of
+the One Definition Rule there), and use the aliases instead.
+
+.. figure:: DriverAlias.mmd.svg
+
+ To get around the One Definition Rule, we can provide aliases of the ADBC
+ APIs instead.
+
+This is rather inconvenient for the application, though. Additionally, this
+sort of defeats the point of using ADBC, since now the application has a
+separate API for each driver, even if they're technically all clones of the
+same API. And this doesn't solve the problem for applications that want to
+load drivers dynamically. For example, a Python script would want to load the
+driver at runtime. In that case, it would need to know which functions from
+the driver correspond to which functions in the ADBC API definitions, without
+having to hardcode this knowledge.
+
+ADBC anticipated this, and defined :cpp:struct:`AdbcDriver`. This is just a
+table of function pointers with one entry per ADBC function. That way, an
+application can dynamically load a driver and call an entrypoint function that
+returns this table of function pointers. (It does have to hardcode or guess
+the name of the entrypoint; the ADBC spec lists a set of names it can try,
+based on the name of the driver library itself.)
+
+.. figure:: DriverTableLoad.mmd.svg
+
+ The application first loads a table of function pointers from the driver.
+
+Then, it can use the driver by calling functions in that table:
+
+.. figure:: DriverTableUse.mmd.svg
+
+ The application uses the table to call driver functions. This approach
+ scales to multiple drivers.
+
+Of course, calling all functions by jumping through a giant table of function
+pointers is inconvenient. So ADBC provides the "driver manager", a library
+that _pretends_ to be a simple driver and implements all the ADBC functions.
+Internally, it loads drivers dynamically, requests the tables of function
+pointers, and keeps track of which connections are using which drivers. The
+application only needs to call the standard ADBC functions, just like in the
+simplest case we started out with:
+
+.. figure:: DriverManagerUse.mmd.svg
+
+ The application uses driver manager to "feel like" it's just using a single
+ driver. The driver manager handles the details behind the scenes.
+
+So to recap, a driver should implement these three things:
+
+#. An implementation of each ADBC function,
+#. A thin wrapper around each implementation function that exports the ADBC
+ name for each function, and
+#. An entrypoint function that returns a :cpp:struct:`AdbcDriver` table,
+ containing the functions from (1).
+
+Then, an application has these choices of ways to use a driver:
+
+- Link the driver directly and call ``Adbc…`` functions (only in the simplest
+ cases) using (2) above,
+- Link the driver directly/dynamically, load the :cpp:struct:`AdbcDriver`
+ via (3) above, and call ADBC functions through function pointers (generally
+ not recommended),
+- Link the ADBC driver manager, call ``Adbc…`` functions, and let the driver
+ manager deal with (3) above (what most applications will want to do).
+
+In other words, it's usually easiest to just always use the driver manager.
+But the magic it pulls isn't required or all that complex.
+
+.. note:: You may ask: when we have :cpp:struct:`AdbcDriver`, why do we bother
+ defining both ``AdbcStatementExecuteQuery`` and
+ ``SqliteStatementExecuteQuery`` (i.e., why do both (1) and (2)
+ above)? Can't we just define the ``Adbc…`` version, and put it into
+ the function table when requested?
+
+ Here, implementation constraints come in. At runtime, when the
+ driver looks up the address of (say) ``AdbcStatementExecuteQuery``
+ to put it into the table, the dynamic linker will come into play to
+ figure out where this function is. Unfortunately, it will probably
+ find it *in the driver manager*. This is a problem, since then the
+ driver manager will end up in an infinite loop when it goes to call
+ the "driver's" version of the function!
+
+ By having a seemingly redundant copy of the function, we can then
+ hide the "real implementation" from the dynamic linker and avoid
+ this behavior.
+
+ The driver manager could try to solve this by loading the drivers
+ with ``RTLD_DEEPBIND``. This, however, is not portable, and causes
+ problems if we also want to use things like AddressSanitizer during
+ development. The driver could also build with flags like
+ ``-Bsymbolic-functions``.
+
+.. _One Definition Rule: https://en.cppreference.com/w/cpp/language/definition#One_Definition_Rule
diff --git a/docs/source/index.rst b/docs/source/index.rst
index 494ccfc711..e29d26c482 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -244,6 +244,7 @@ Why ADBC?
format/specification
format/versioning
format/comparison
+ format/how_manager
.. toctree::
:maxdepth: 1