-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: faster wrapping #63
feat: faster wrapping #63
Conversation
I should add that this PR is not ready to be merged as it's lacking tests and I'm sure the code can be improved. However, it is ready for a first round of feedbacks to better direct my efforts. Tagging @jeswr , @rubensworks and @RubenVerborgh to keep everyone in the loop. BTW, this would have to be rebased on |
I've also seen this pattern in the core of Comunica (with a few synchronous transforms in between), mainly in the context of handling iterators that are wrapped in a promise. It would be good to have benchmarks on that case as well - i.e. let i = 0;
const arr = new Array(200_000).fill(true).map(() => i++);
const source = new ArrayIterator(arr);
const opts = { letIteratorThrough: false };
const then = Date.now();
wrap(Promise.resolve(wrap(Promise.resolve(wrap(Promise.resolve(wrap(Promise.resolve(source), opts)), opts)), opts)), opts).on('data', () => {}).on('end', () => {
console.log('elapsed', Date.now() - then, 'ms');
}); |
@jeswr I've tweaked your test a bit to ignore the time spent establishing the chain, just in case it could make a difference (likely insignificant, but still). let i = 0;
const arr = new Array(200_000).fill(true).map(() => i++);
const source = new ArrayIterator(arr);
const opts = { /* letIteratorThrough: true */ };
const chain = wrap(Promise.resolve(wrap(Promise.resolve(wrap(Promise.resolve(wrap(Promise.resolve(source), opts)), opts)), opts)), opts);
chain.once('readable', () => {
const then = Date.now();
chain.on('data', () => {}).on('end', () => {
console.log('elapsed', Date.now() - then, 'ms');
});
}); This test runs in ~450 ms with current |
Not sure about what's going on with coveralls but this is probably ready for review by @RubenVerborgh (in due time, after #59 is merged). I'll resume working on cloning. |
@RubenVerborgh I have just rebased this one on |
Thanks, removed the second option 16fcfb6 and will talk to Jesse over lunch about the other one!
|
Awesome! |
The request for that was in relation to performance when wrapping N3.js. By default The main consequence of removing this is that downstream clients like Comunica will sometimes need to write if ((isIterable(source) || isIterator(source)))
return fromIterable<T>(source);
else if
return wrap<T>(source); rather than return wrap(source, { prioritizeIterable: true }) |
asynciterator.ts
Outdated
this._source = isAsyncReadable(source) ? source : fromSource(source); | ||
} | ||
catch (error) { | ||
scheduleTask(() => this.emit('error', error)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be scheduled? (why can't this be called immediately?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happens during construction; consumer has had no time to attach listener yet.
Thanks @jeswr; I'll take a look at the backwards compat thing too then. But we should def put performance-friendly options first. (Also this will be one of the last changes before semver.major, so then we can happily break.) |
Thanks @jeswr for the refresher. I hadn't realized we already have a use case for the |
I like where this ended at! Thanks @RubenVerborgh , +1 from me to merge and I’ll try to rebase #78 post-merge as soon as possible so that it can also be merged, completing work on wrapping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a deeper review than last time and picked up on some potential bugs / race conditions. Apologies for not identifying them in the last review.
A major thank you to @jacoscaz as well as @jeswr for your efforts and patience. I realize it has taken me a lot of time to review and complete this PR, so I definitely wanted to emphasize my appreciation for helping out with open source. Because of its precise and high-performance nature, AsyncIterator is a special project in the sense that every PR tends to require one uninterrupted day for me to work on it. There's a lot of thinking work on finding which code makes it easiest to reason about correct behavior in a wide array of edge cases. So if you find me replacing or rewriting large chunks of code you have contributed, it's not you—it's me! Basically the hard work is finding the initial code that works, and covering it with an appropriate set of extensive tests, which is what you've done. Then it's my job to re-interpret the code that satisfies that those tests, and that's what tends to take me quite some time to get right. Thanks for bearing with me, and I look forward to getting to the next PRs! |
Published as v3.6.0 🥳 |
@RubenVerborgh thank you for this library in the first place and all the effort that goes into maintaining it! I enjoy your reviews as they always lead to objectively better code from a correctness / performance standpoint. I also enjoy seeing the same code rewritten in a style that is not the one I started out with - there's always a great deal to learn. |
This PR covers step 2. in the plan described in #44 (comment) and aims to 1) lower the overhead of the
wrap()
function and 2) expand support to additional types of sources (namelyIterator
andIterable
objects).The rationale for this PR is taking the onus of figuring out how to best wrap common types of data sources away from users of this library. Users should be able to expect
wrap()
to use the strategy that is most appropriate for any given source (as long as it's a supported source).Performance-wise, this PR does two things:
It introduces source-specific iterators that synchronously read from their sources, skipping the internal buffering done by
TransformIterator
(i.e. the class currently used for wrapping)It introduces a specific option that allows
AsyncIterator
instances to fall-through as they are, with no wrapping appliedThe following test, admittedly naive, takes ~116 ms with the current
main
branch and ~8 ms with this PR (letIteratorThrough
makes virtually no difference). This mostly highlight the difference made by skippingTransformIterator
's internal buffering.Things become much more interesting when we consider what happens with a chain established by
wrap()
ing:The above test takes ~250 ms with the current
main
branch, ~ 15 ms with this PR andletIteratorThrough: false
, ~ 8 ms with this PR andletIteratorThrough: true
. I haven't seen consistent differences with or without warmups.My use case is the optimization of the interface between quadstore, which implements the RDF/JS
Source
interface, and comunica, through which the former can be queried via SPARQL: https://github.com/comunica/comunica/blob/a2f31270778730911bd0bdf8b8c7ade567153319/packages/actor-rdf-resolve-quad-pattern-rdfjs-source/lib/RdfJsQuadSource.ts#L29 .As many algebraic operation (matching, joining, ...) are happening in-memory, thus requiring the streaming of large amounts of quads from the persistence layer into the engine, removing even just one layer of indirection/buffering can make a significant difference.