-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFE] Optional backend proxy to enable bigger data sources #51
Comments
You might be able to get most of what you want with a database function which takes in a URL and post parameters then returns a JSON document which parsed as a table with the JSON1 extension using roapi-http. Likely would need the ability to externalize database functions in Sqliteviz or possibly in your Sqljs build script. For larger datasets you might want to store the JSON document and then process it. |
I don't think SQLite is needed in between in this case. Yes, then there's an issue with inconsistency of the SQL dialect(s) in UI (datafusion's in case of But also backends should be swappable, once Sqliteviz establishes the expected HTTP API. So making one for particular SQL-like source with the HTTP server technology of choice should be a very simple task. |
It took me a second to understand the thought process, are you expecting sqliteviz to be a producer of data to roapi and then have one search interface? Or be the interface for roapi? If you have a SQLiteviz with local data how do you query it with the data over http? When I talked about a database function I meant within JavaScript not a C program compiled with SQLite. |
That's reciprocal ;-) But I'm curios to understand your idea. How sqliteviz works now (and should continue) -- main use case.
What this RFE seeks is this exploratory/experimantal use case.
What I am saying is that the effective SQL engine † can be different, but that's okay (even though it would complicate UI). If the experiment succeeds, later we can think how to combine the two (say, VFS for |
I agree in that scenario I am not sure how you would standardize all the functions and operations across multiple platforms. If you pick JSON as the interchange format, sqljs at the worker level let's you define JavaScript database functions. You can then make api calls through the browser JavaScript and return the json in a table format using the JSON1 extension much like generate series is a table valued function. Make sense? Couple benefits to this approach are the choice between static and real time data and ability to automate multiple sources at once. Of course CORS is an issue if you don't have access to the endpoint, but if it's centralized and configured for access I think this may meet your use case. This would be easiest on json arrays, access to manipulate hierarchial json would be achieved through sqlite JSON1 extension |
I guess there has already been some standardisation going on around ISO/IEC_9075 and we can just build on its subset. Basically
Yes, now I understand. We touched it before, and I still stand by #56 (comment). For this use case, I don't think it would work. It either requires you to load all the data to aggregate it in the browser, which defeats whole point of this RFE -- data doesn't fit in browser/RAM, aggregate it elsewhere prior to visualisation in sqliteviz. SELECT json_extract(value, '$.name') name, COUNT(*) cnt
FROM json_each(fetch_sync('http://localhost/some_huge_json_array.json'))
GROUP BY 1 Or in case of SELECT json_extract(value, '$.name') name, json_extract(value, '$.cnt') cnt
FROM json_each(
fetch_sync_post(
'http://localhost:8080/api/sql',
'SELECT name, COUNT(*) cnt FROM some_huge_csv_file GROUP BY 1'
)
) Where the suggested alternative is filling a form in UI to point to the host-port SELECT name, COUNT(*) cnt
FROM some_huge_csv_file
GROUP BY 1 Plus there will be schema and auto-completion as it is now. |
I think I get it now, you want to http request flat JSON from a website through your browser, convert and feed the result javascript array to the sqliteviz table, and visualize the data without using sqlite, right? Not sure how multiple tabs would work, maybe that's a per tab functionality. My comments were based on collecting aggregated data from external systems and aggregating it with data in the browser sqlite; it sounds like your usecase is different. It would require another extension (statement vtab) but it wouldn't be hard for a user to define a custom function: to abstract the URL/JSON1 queries and internally call the fetch_sync_post function above:
For me this adds a bit of transparency of where the data is from in a complete savable inquiry file that is repeatable. In my experience even one million records JSON1 extension is fast enough not to need tables. I will digress, different use case. |
As a user of Sqliteviz,
In order to simplify my workflow with bigger databases,
I want to be able to query them directly in sqliteviz w/o the need to pre-aggregate data for it.
A local SQLite database file can easily be 1-3 order of magnitude bigger than what (allowed to) fits into browser tab's memory. Using it directly w/o producing pre-aggregated samples for visualisation can simplify the workflow a lot (or make sqliteviz attractive to another audience). It's likely not about the volume of the result set (even what fits into the browser may not make sense to pass to Plotly -- there are much fewer pixels space there even taking into account zooming), but just the span of aggregates that is accessible.
UI
Code-wise, sql.js interactions are abstracted by a promise-based API, so it should not be much difference for the core logic.
HTTP API
Points:No persistent connection on the backend -- everything is request-responseDistribution -- just a script(s) (preferably w/o or with few dependencies) in the repoDatabase candidates on the surfaces (e.g. for a Python module):stdlibsqlite3
and/orapsw
(orapsw-wheels
) and/or rqlite HTTP API directlyDuckDB -- justpip install duckdb
away, much faster for OLAP queries than SQLitecan be SQLAlchemy for anything else as the central interfaceComplying to the API any other SQL-speaking source can potentially be added it the need be.Endpoints:Execute a SQL query scriptSource DSN/URI (e.g.sqlite3.connect('file:path/to/database?mode=ro', uri=True)
in Python). Credentials also fit here.
SQL query itselfResult set: list of dicts in JSONCursor meta?Auto-completion for database schema/hierarchy, functions, etc. Upfront or on-demand? E.g.SQLite has
COMPLETION
table-valued function. TBD.OpenAPI endpointhttps://github.com/roapi/roapi is nearly perfect for the task:
Installed via
pip
or DockerSimple HTTP API
Point to a file
Then query like:
Outputs JSON (and also supports CSV and Apache Arrow serialisation).
Schema is available at
/api/schema
.Input transport: local filesystem, HTTP, S3.
Input formats: SQLite, CSV, (ND)JSON, parquet, MySQL, DeltaLake, Google spreadsheet.
Some interesting functionality is in to-do:
The text was updated successfully, but these errors were encountered: