Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The output seems slow to me. Am I doing something wrong? #9

Open
aborruso opened this issue Jul 31, 2022 · 5 comments
Open

The output seems slow to me. Am I doing something wrong? #9

aborruso opened this issue Jul 31, 2022 · 5 comments

Comments

@aborruso
Copy link

aborruso commented Jul 31, 2022

Hi,
I'm using lines0.so v0.1.1.

If I run this command on this input csv file

sqlite3  tmp.sqlite "select count(*) from lines_read('input.csv');"

I have the output in 0.229 seconds.

If I apply the same to your example ndjson file

sqlite3  tmp.sqlite "select count(*) from lines_read('calendar.ndjson');"

I have the output in 51.493 seconds. Probably I'm doing something wrong.
Using wc <calendar.ndjson wc -l I have the output in 1.053 seconds.

Thank you

@asg017
Copy link
Owner

asg017 commented Jul 31, 2022

Hey @aborruso , that does seem odd - can you run the following script as a file called debug.sql?

.bail on
.timer on

.load ./lines0

select lines_debug();

select count(*) from lines_read('calendar.ndjson');

And then run it with sqlite3 :memory: '.read debug.sql'?

$ sqlite3 :memory: '.read debug.sql'
Version: v0.1.1
Date: 2022-06-22T14:35:59Z+0000
Source: 37c8d2dde4c97395b6af837ec7bd6f7af639e79f
Run Time: real 0.000 user 0.000000 sys 0.000138
321981
Run Time: real 0.126 user 0.066360 sys 0.059196

It runs in ~100ms for me on a digital ocean droplet with 8gb of RAM.

@aborruso
Copy link
Author

Thank you @asg017
I have used your script, but I have removed .load ./lines0 because I have .load /home/aborruso/library/lines0.so in my ~/.sqliterc file.

This is the result

-- Loading resources from /home/aborruso/.sqliterc
Version: v0.1.1
Date: 2022-06-22T14:35:59Z+0000
Source: 37c8d2dde4c97395b6af837ec7bd6f7af639e79f
Run Time: real 0.000 user 0.000000 sys 0.000172
321981
Run Time: real 52.250 user 0.617902 sys 6.183273

@aborruso
Copy link
Author

I use the lines0 I have in lines0-linux-amd64.zip

@asg017
Copy link
Owner

asg017 commented Jul 31, 2022

Woah, that's pretty wild to see... Could you try compiling the library yourself and see if that changes anything? It should be straightforward to do, no external deps

git clone [email protected]:asg017/sqlite-lines.git
cd sqlite-lines
mkdir -p dist/
make loadable
sqlite3 :memory: '.timer on' '.load dist/lines0' 'select count(*) from lines_read("calendar.ndjson");'

I'm guessing the pre-compiled library might be way slower for some reason, and possibly compiling yourself may fix it? Weird how it only seems to get tripped up on calendar.json, too

@aborruso
Copy link
Author

aborruso commented Aug 1, 2022

Could you try compiling the library yourself and see if that changes anything? It should be straightforward to do, no external deps

It's the same :(

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants