-
Notifications
You must be signed in to change notification settings - Fork 3
Initial development motivation
To explain the initial choices that went into the goals and roadmap for Norgberg:
The baseline proposal for Neorg was an SQLite database. SQLite is lightweight and very common as an embedded relational database, which makes it a straightforward technology candidate.
The goals intended to be reached with the database module are somewhat distributed, but we can find goals in the general Roadmap of Neorg, the big Zettelkasten Neorg tracking issue as well as the Groen Roadmap. We see common themes of note data caching in a way that allows for fast, efficient querying, as well as resolving links and backlinks for files, plus synchronisation services between devices. The goal is to replace constant whole-workspace parses with faster, efficient database accesses.
Not all of the big solutions currently in widespread use have open-source codebases, so getting to the underlying way they have chosen to solve the caching issue and make querying efficent and fast are not all accessible for assessment.
But even so, the open-source community provides plenty of older examples. The tool GitHub - sirupsen/zk: Zettelkasten on the command-line 📚 🔍 is a classical CLI example that also caches information in SQLite. Org-roam is another using SQLite to cache information related to note connections and metadata.
Logseq is one of the larger open-source linked notes applications that comes as a whole stack: they are noticable for their use of the Datalog graph query language, implemented on the DataScipt database engine.
Graph databases are interesting for their abililty to optimize the representation and indexing strategy of the graph structure with its edges and notes on disk and in memory for efficient query execution and matching of graph structures. (A lot of the more advanced queries will look for specific structures of edges and notes matching sets of contraints to provide result).
Graph databases are also inherently schemaless, which makes them very interesting for dealing with the nature of fluid note-taking involving structured data: people will add data without following definitive schemas all the time. In Neorg this is even more powerful due to the tag extensions defined in the language standard - this lets users strew in semi-structured data at any point in the norg file, for use by any other system such as the inbuild macro system or other data-manipulating modules. Applications like semantic links or the creation of object definitions with properties for add-hoc databases along the lines of Notion or Dataview thus become possible. On the other hand, graphs of links are also interesting for the presentation of matters like second-order links (what other files link to a file behind a link?), citation analysis, etc.
All such applications can be empowered by shipping a graph database, which should be more performant and easy to query in these matters thanks to its purposeful design. Graph databases also automatically take care of ACID, persistence to disk, indexing, query optimization, etc, over pure in-memory graph formats. Proposals along this line have been discussed before, see Graph backend for notes (-> zettlekasten) · Issue #36 · nvim-neorg/neorg · GitHub.
The issue here is one of time and evaluation. Other projecs in the Neorg ecosystem depend on having a basic database and link resolution service.
There are many ways so realize graph representations.
Graph modelling can be done in SQL itself with various schemas:
- https://github.com/dpapathanasiou/simple-graph
- https://www.slideshare.net/billkarwin/models-for-hierarchical-data
However, such modelling comes with caveats for advanced operations like graph traversals, where SQL queries become inefficent. The use of indices to speed up the search can also mean that graph edge updates trigger large index recomputations, slowing down edge inserts and updates.
There are also graph databases, which implement their own mappings and expose a more purposeful query language and data modelling schema. Graph databases use a "storage engine" which is relational or key-value, but can hide the needed optimizations for efficient seaching and updating from the user.
Since all the other modules are being developed in Rust, we apply the same constraintrs here. (Most large graph databases run on the JVM, and pose their own related challenges as a consequence)
We have four offerings.
SurrealDB is a combined schemaless-schemaful document-graph database with optional table-like schemas. Graph edges can themselves store key-value sets and objects. Queries can run against both. The developers are indicating support for geographic data. The underlying storage engine for embedded use is RocksDB.
IndraDB is a pure graph database that uses either PostgressQl or sled as the storage engine. PostgresQL is a larger engine and very slow, sled has not yet stabilized and may require manual database migrations. The documentation is also lackluster.
Oxigraph is a SPARQL-standard graph database working with RDF tribles. That means some enhanced portability because we have a standardized serialization format and query language. However query optimization is presently not availiable. Storage engine is RocksDB.
RocksDB has been noted to have some issues with leaving around log files and being a pretty large install, comparatively - 2.5 MB versus 500 kB for sled. https://github.com/surrealdb/surrealdb/issues/1445?cmdid=MI3DW0N55YOLWZ
CozoDB has rust bindings, implements relational-graph-vector combined schemas, and uses Datalog as its query language. The storage engine selection is large, including using RocksDB and SQLite as options.
All of these options come with uncertainties. Many of them are also not powerful enough in basic design or current implementation to meet all of the demands of a relational database. Sometimes, relational databases are just the right tool for the job, such as for file synchronisaton or subsets of application state storage.
So while many further applications sorounding note graphs can profit from the technological specificity of graph databases and their optimizations of query performance, there are technological readiness as well as capability concerns that inhibit any direct choices for any single graph database at this time. We have imminent pressure to provide basic capabilities so that other projects can begin implementing in earnest.
As a consequence, we consider rolling out a graph database a long-term goal, and for the moment will use SQLite schemas to model basic graph structrue primitives. This will be sufficient for first-order services.