-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream database file line by line to avoid string length limit in Node.js #5
Conversation
…ase file content; Passes all browser tests
@tex0l Just in case you might have missed this PR.. The original pull request mentions that streaming files line by line allows loading large databases "without running into the string length limit for node", which is reportedly 256MB. So, it seems like a useful feature to support larger file sizes. (Ideally it'd be best to generate a huge file for testing, but I have not done this.) |
Hi @eliot-akira |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @eliot-akira,
First of all, thank you for porting the PR to @seald-io/nedb, and sorry for the delay in reviewing this PR..
I've reviewed it thoroughly, I've explained everything directly line by line in the files changed, but the TLDR is:
- the PR is not properly linted, you can
npm run lint
to check the linting status, and if you want to auto-fix small issues, you can runnpx standard --fix
(will fix anything that doesn't change runtime) ; - I'd rather not import
byline
, the package makes 150 lines and hasn't been updated in 5 years (https://github.com/jahewson/node-byline/blob/master/lib/byline.js). I think we should either find a maintained replacement, or include the code and its tests into nedb, and reformat it to this package's standards. - I'm not comfortable with the behavior with the
corruptAlertThreshold
, I'd rather strictly adhere to the node convention of having at leasterr
orres
to benull
. What I recommend is when thecorruptAlertThreshold
is reached to stop reading from the stream (and clean it properly without having it hanging in the process), and throw an error. We can open a new issue to return thecorruptItems
count if this information is useful to anyone.
@arantes555 do you share the same opinion?
I won't have time to implement the modifications though, can you make the @eliot-akira, please? I'll try to be quicker in my next review.
…or this; Buffer.from instead of new Buffer
Thank you for a detailed review of the PR. I ran the linter, made suggested improvements, and resolved the associated comments. There's still a question about |
Thank you for implementing the fixes so quickly! For For the behavior of treatRawStream when there are corruptedItems, I made a suggestion in the comments, tell me what you think. Once both items are fixed, the next step will be publishing a pre-version we'll test internally at Seald to check it doesn't break everything. |
Sounds good. If I find the time, I'd like to actually create a database larger than 256MB, and test the "string length limit for node" that they were talking about. It would be nice to confirm that this PR really solves it. |
Done. I've improved the function treatRawStream and its test; then moved over and adapted the tests from byline. |
Hi @eliot-akira Once it passed this test cycle, I'll merge the branch, release a 2.1.0, and close the PR :). By the way, I took the liberty to add you to the contributors, and name you in the changelog if that's ok for you. |
Nice, I didn't realize it was |
Hi @eliot-akira, |
I learned more about the string length limit in Node.js. In the V8 source (
I created a test script to generate a database filled with large strings, and was able to confirm that there is in fact such a limit. With the original
Using this PR branch, I found that it does load the database but still ends with an error.
This is in the function So, it looks like the PR is not complete yet. When I find the time, I will see how to adapt these two functions. |
OK, in commit fc1d0e0, I changed it so the file is written line by line when persisting cached database. Great, now I've confirmed that this PR does allow loading large databases, beyond the string length limit. |
lib/storage.js
Outdated
stream.close(cb) | ||
}) | ||
readable.on('error', () => cb) | ||
stream.on('error', () => cb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this call cb
with an error?
readable.on('error', err => cb(err))
stream.on('error', err => cb(err))
..or maybe simpler:
readable.on('error', cb)
stream.on('error', cb)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, that's what I wanted to write 🤦
@tex0l just noticed a last detail left to deal with: for the I'll try doing that now, and then we should be good to publish :) |
My bad, you already took care of that ^^ |
It seems all good to me. We'll do some tests with this pre-version on our internal projects, and release ! Excellent job |
Actually, I'll push another commit with a minor detail : I'm extracting |
Hi @arantes555 @eliot-akira, |
@tex0l @arantes555 |
This pull request implements streaming the database file line by line, based on louischatriot#463. Solves issue #3.
It follows the original PR closely, excluding unnecessary formatting changes. I adapted the syntax to the new codebase, such as the use of
let
andconst
instead ofvar
, and some arrow functions.It passes all server-side tests, and in the second commit, I made changes necessary to pass all browser tests. It turned out that localforage returns the entire database file content, so it was impractical to turn that into a stream. So, in the browser it reads the file the same way as before.