Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ingestion of list types for SQLite #2212

Open
danielballan opened this issue Oct 2, 2024 · 8 comments
Open

Implement ingestion of list types for SQLite #2212

danielballan opened this issue Oct 2, 2024 · 8 comments
Labels
Type: enhancement New feature or request

Comments

@danielballan
Copy link

What feature or improvement would you like to see?

In #2066 @skarakuzu and I outlined a use case for streaming LIST data into PostgreSQL and SQLite via ADBC. Support for PostgreSQL was recently added. (Thanks, @paleolimbot!)

Would it be justifiable to add LIST support for SQLite as well? Our use case is embedded scientific data collection scenarios, where it is not feasible to deploy PostgreSQL but we'd like a compatible-as-possible data model. In #2066 had floated an idea of using a SQLite JSON or JSONB column for this, as SQLite is generally loose about types and lacks any array/list type.

We would be happy to try to contribute this if that is feasible.

@danielballan danielballan added the Type: enhancement New feature or request label Oct 2, 2024
@WillAyd
Copy link
Contributor

WillAyd commented Oct 2, 2024

Does SQLite actually have a JSON type or just functions for handling JSON strings/blobs? I see the latter in the documentation but not the former

https://sqlite.org/datatype3.html
https://sqlite.org/json1.html

Unless I am overlooking something I think these would need to be stored as strings (?)

@danielballan
Copy link
Author

Right. Evidently it does have a distinct internal data structure (JSONB) which is uses for some functionality but the data would be stored as a string.

@paleolimbot
Copy link
Member

I think the main issue here is that the infrastructure for serializing an arbitrary list to JSON doesn't currently exist (in ADBC or nanoarrow). We probably don't need a JSON library to write simple things (e.g., lists of integers), but we do need to be sure that bit is well-tested (and that we want that scope to exist in ADBC).

@lidavidm
Copy link
Member

lidavidm commented Oct 7, 2024

Hmm, lists of integers or other simple things should be fine (though, JSON doesn't have integers!) but I'd worry that could scope creep into half a JSON writer. I wonder if we need to start considering some sort of plugin mechanism so we can optionally ship bits like this? (Alternatively, is there a JSON library light enough that we could embed it and not worry about the dependency?)

@danielballan
Copy link
Author

Actually, I wonder SQLite has the functionality we need in its C API. Does json_array solve the serialization problem?

@paleolimbot
Copy link
Member

Alternatively, is there a JSON library light enough that we could embed it and not worry about the dependency?

I have never had problems with nlohmann_json in nanoarrow (https://github.com/nlohmann/json), although we might not need it provided we can use to_chars() for locale-independent floating-point-to-string conversion (strings require escaping and binary require some special handling: https://github.com/apache/arrow-nanoarrow/blob/7c1fb36029a92ee039d2235be07c70d8726b2a7f/src/nanoarrow/testing/testing.cc#L54-L69 )

Does json_array solve the serialization problem?

I suppose we could generate a query with parameters like SELECT json_array(?, ?, ?, ?), bind the values, and grab the result? Unless json_array() is in the C API?

@lidavidm
Copy link
Member

lidavidm commented Oct 8, 2024

Yeah, I think it'd have to be a query. I glanced around and nothing seems to be in the header.

@danielballan
Copy link
Author

Ah, got it. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants