Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream database file line by line to avoid string length limit in Node.js #5

Merged
merged 36 commits into from
Oct 19, 2021
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
a67622d
Stream files line by line when parsing to avoid memory limits
eliot-akira Aug 11, 2021
2a3afcb
Do not stream on browser side, since localforage returns entire datab…
eliot-akira Aug 11, 2021
b7e7f2b
Optimize browser build by replacing "byline" with empty object
eliot-akira Aug 11, 2021
a7ce6a7
Remove unused library "util" for browser side
eliot-akira Aug 11, 2021
f3f6252
Restore original tests for non-streaming method treatRawData
eliot-akira Aug 14, 2021
842f3e9
Lint: Formatting
eliot-akira Oct 3, 2021
84eee58
Use regular expression literal instead of RegExp
eliot-akira Oct 3, 2021
e6388c2
Test: Assert error is null
eliot-akira Oct 3, 2021
24dd632
On error, result is null; on result, error is null
eliot-akira Oct 3, 2021
3729d4c
Use arrow functions to preserve this
eliot-akira Oct 3, 2021
5b37cff
Use template literals for string interpolation
eliot-akira Oct 3, 2021
143a08e
Remove console.log from test
eliot-akira Oct 3, 2021
631d672
Return treated data even after corrupt threshold
eliot-akira Oct 3, 2021
ec5f3ff
Include fork of byline library
eliot-akira Oct 3, 2021
6884f23
Lint byline: Use strict equal; const instead of var; arrow function f…
eliot-akira Oct 3, 2021
1cd3381
Handle error: For this test, it is expected to be *not* null
eliot-akira Oct 3, 2021
af5c5d7
Remove dependency on byline
eliot-akira Oct 3, 2021
a826a9c
Align behavior of treatRawData and treatRawStream in handling last bl…
eliot-akira Oct 5, 2021
575f46c
Include tests for byline
eliot-akira Oct 5, 2021
225fe0f
Adapt tests from byline: Lint; Use assert from chai; Local paths for …
eliot-akira Oct 5, 2021
0af674d
Tests: Simplify getting local paths for test files
eliot-akira Oct 5, 2021
a610cf8
treatRawData and treatRawStream: Get values more efficiently with Obj…
eliot-akira Oct 7, 2021
823db2a
assert.deepStrictEqual instead of deepEqual
eliot-akira Oct 7, 2021
fc1d0e0
Write file line by line when persisting cached database
eliot-akira Oct 9, 2021
af1d64f
Write new line separately, instead of adding to line string
eliot-akira Oct 9, 2021
437a841
Lint
eliot-akira Oct 9, 2021
ba14f11
Add final new line to align behavior with Node.js version
eliot-akira Oct 12, 2021
7404ee1
Stream lines using setImmediate to ensure it doesn't block event loop
eliot-akira Oct 13, 2021
8c4f56a
streaming write with `Readable.from`
arantes555 Oct 15, 2021
148d7c8
fix error callbacks
arantes555 Oct 15, 2021
cbfd31f
changelog and contributors
Oct 5, 2021
3fe1df2
fix package.json browser field & update changelog
Oct 7, 2021
3d42f30
changelog
arantes555 Oct 15, 2021
6960424
2.1.0-2
arantes555 Oct 15, 2021
8b984f1
extract writeFileLines into its own function
arantes555 Oct 15, 2021
84e19ea
2.1.0-3
arantes555 Oct 15, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions browser-version/lib/byline.js
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
module.exports = {}
eliot-akira marked this conversation as resolved.
Show resolved Hide resolved
78 changes: 71 additions & 7 deletions lib/persistence.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
*/
const path = require('path')
const async = require('async')
const byline = require('byline')
tex0l marked this conversation as resolved.
Show resolved Hide resolved
const customUtils = require('./customUtils.js')
const Index = require('./indexes.js')
const model = require('./model.js')
Expand Down Expand Up @@ -183,6 +184,55 @@ class Persistence {
return { data: tdata, indexes: indexes }
}

/**
* From a database's raw stream, return the corresponding
* machine understandable collection
*/
treatRawStream (rawStream, cb) {
const dataById = {}
const tdata = []
const indexes = {}
let corruptItems = 0

const lineStream = byline(rawStream)
const that = this
eliot-akira marked this conversation as resolved.
Show resolved Hide resolved
let length = 0

lineStream.on('data', function(line) {

try {
const doc = model.deserialize(that.beforeDeserialization(line))
if (doc._id) {
if (doc.$$deleted === true) delete dataById[doc._id]
else dataById[doc._id] = doc
} else if (doc.$$indexCreated && doc.$$indexCreated.fieldName != null) indexes[doc.$$indexCreated.fieldName] = doc.$$indexCreated
else if (typeof doc.$$indexRemoved === 'string') delete indexes[doc.$$indexRemoved]
} catch (e) {
corruptItems += 1
}

length++
})

lineStream.on('end', function() {
// A bit lenient on corruption
let err
eliot-akira marked this conversation as resolved.
Show resolved Hide resolved
if (length > 0 && corruptItems / length > that.corruptAlertThreshold) {
err = new Error("More than " + Math.floor(100 * that.corruptAlertThreshold) + "% of the data file is corrupt, the wrong beforeDeserialization hook may be used. Cautiously refusing to start NeDB to prevent dataloss")
eliot-akira marked this conversation as resolved.
Show resolved Hide resolved
}

Object.keys(dataById).forEach(function (k) {
tdata.push(dataById[k])
})
arantes555 marked this conversation as resolved.
Show resolved Hide resolved

cb(err, { data: tdata, indexes: indexes })
tex0l marked this conversation as resolved.
Show resolved Hide resolved
})

lineStream.on('error', function(err) {
cb(err)
})
}

/**
* Load the database
* 1) Create all indexes
Expand All @@ -207,14 +257,9 @@ class Persistence {
// eslint-disable-next-line node/handle-callback-err
storage.ensureDatafileIntegrity(this.filename, err => {
// TODO: handle error
storage.readFile(this.filename, 'utf8', (err, rawData) => {
const treatedDataCallback = (err, treatedData) => {

eliot-akira marked this conversation as resolved.
Show resolved Hide resolved
if (err) return cb(err)
let treatedData
try {
treatedData = this.treatRawData(rawData)
} catch (e) {
return cb(e)
}

// Recreate all indexes in the datafile
Object.keys(treatedData.indexes).forEach(key => {
Expand All @@ -230,6 +275,25 @@ class Persistence {
}

this.db.persistence.persistCachedDatabase(cb)
}

if (storage.readFileStream) {
// Server side
const fileStream = storage.readFileStream(this.filename, { encoding : 'utf8' })
eliot-akira marked this conversation as resolved.
Show resolved Hide resolved
this.treatRawStream(fileStream, treatedDataCallback)
return
}

// Browser
storage.readFile(this.filename, 'utf8', (err, rawData) => {
if (err) return cb(err)

try {
const treatedData = this.treatRawData(rawData)
treatedDataCallback(null, treatedData)
} catch (e) {
return cb(e)
}
})
})
})
Expand Down
1 change: 1 addition & 0 deletions lib/storage.js
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ storage.writeFile = fs.writeFile
storage.unlink = fs.unlink
storage.appendFile = fs.appendFile
storage.readFile = fs.readFile
storage.readFileStream = fs.createReadStream
storage.mkdir = fs.mkdir

/**
Expand Down
56 changes: 36 additions & 20 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
"dependencies": {
"@seald-io/binary-search-tree": "^1.0.2",
"async": "0.2.10",
"byline": "^5.0.0",
"localforage": "^1.9.0"
},
"devDependencies": {
Expand Down
Loading