Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: faster wrapping #63

Merged
merged 5 commits into from
Jul 31, 2022
Merged

feat: faster wrapping #63

merged 5 commits into from
Jul 31, 2022

Conversation

jacoscaz
Copy link
Collaborator

@jacoscaz jacoscaz commented Apr 7, 2022

This PR covers step 2. in the plan described in #44 (comment) and aims to 1) lower the overhead of the wrap() function and 2) expand support to additional types of sources (namely Iterator and Iterable objects).

The rationale for this PR is taking the onus of figuring out how to best wrap common types of data sources away from users of this library. Users should be able to expect wrap() to use the strategy that is most appropriate for any given source (as long as it's a supported source).

Performance-wise, this PR does two things:

  1. It introduces source-specific iterators that synchronously read from their sources, skipping the internal buffering done by TransformIterator (i.e. the class currently used for wrapping)

  2. It introduces a specific option that allows AsyncIterator instances to fall-through as they are, with no wrapping applied

The following test, admittedly naive, takes ~116 ms with the current main branch and ~8 ms with this PR (letIteratorThrough makes virtually no difference). This mostly highlight the difference made by skipping TransformIterator's internal buffering.

let i = 0;
const arr = new Array(200_000).fill(true).map(() => i++);
const source = new ArrayIterator(arr);
const opts = { letIteratorThrough: false };
const then = Date.now();
wrap(source, opts).on('data', () => {}).on('end', () => {
  console.log('elapsed', Date.now() - then, 'ms');
});

Things become much more interesting when we consider what happens with a chain established by wrap()ing:

let i = 0;
const arr = new Array(200_000).fill(true).map(() => i++);
const source = new ArrayIterator(arr);
const opts = { letIteratorThrough: false };
const then = Date.now();
wrap(wrap(wrap(wrap(source, opts), opts), opts), opts).on('data', () => {}).on('end', () => {
  console.log('elapsed', Date.now() - then, 'ms');
});

The above test takes ~250 ms with the current main branch, ~ 15 ms with this PR and letIteratorThrough: false, ~ 8 ms with this PR and letIteratorThrough: true. I haven't seen consistent differences with or without warmups.

My use case is the optimization of the interface between quadstore, which implements the RDF/JS Source interface, and comunica, through which the former can be queried via SPARQL: https://github.com/comunica/comunica/blob/a2f31270778730911bd0bdf8b8c7ade567153319/packages/actor-rdf-resolve-quad-pattern-rdfjs-source/lib/RdfJsQuadSource.ts#L29 .

As many algebraic operation (matching, joining, ...) are happening in-memory, thus requiring the streaming of large amounts of quads from the persistence layer into the engine, removing even just one layer of indirection/buffering can make a significant difference.

@jacoscaz
Copy link
Collaborator Author

jacoscaz commented Apr 7, 2022

I should add that this PR is not ready to be merged as it's lacking tests and I'm sure the code can be improved. However, it is ready for a first round of feedbacks to better direct my efforts. Tagging @jeswr , @rubensworks and @RubenVerborgh to keep everyone in the loop. BTW, this would have to be rebased on main once #59 gets merged.

asynciterator.ts Outdated Show resolved Hide resolved
asynciterator.ts Outdated Show resolved Hide resolved
@jeswr
Copy link
Collaborator

jeswr commented Apr 7, 2022

Things become much more interesting when we consider what happens with a chain established by wrap()ing:

I've also seen this pattern in the core of Comunica (with a few synchronous transforms in between), mainly in the context of handling iterators that are wrapped in a promise. It would be good to have benchmarks on that case as well - i.e.

let i = 0;
const arr = new Array(200_000).fill(true).map(() => i++);
const source = new ArrayIterator(arr);
const opts = { letIteratorThrough: false };
const then = Date.now();
wrap(Promise.resolve(wrap(Promise.resolve(wrap(Promise.resolve(wrap(Promise.resolve(source), opts)), opts)), opts)), opts).on('data', () => {}).on('end', () => {
  console.log('elapsed', Date.now() - then, 'ms');
});

@jacoscaz
Copy link
Collaborator Author

jacoscaz commented Apr 8, 2022

@jeswr I've tweaked your test a bit to ignore the time spent establishing the chain, just in case it could make a difference (likely insignificant, but still).

let i = 0;
const arr = new Array(200_000).fill(true).map(() => i++);

const source = new ArrayIterator(arr);
const opts = { /* letIteratorThrough: true */ };

const chain = wrap(Promise.resolve(wrap(Promise.resolve(wrap(Promise.resolve(wrap(Promise.resolve(source), opts)), opts)), opts)), opts);

chain.once('readable', () => {
  const then = Date.now();
  chain.on('data', () => {}).on('end', () => {
    console.log('elapsed', Date.now() - then, 'ms');
  });
});

This test runs in ~450 ms with current main and ~12 ms with this PR. letIteratorThrough makes no difference as having to return an iterator synchronously forces the use of WrappingIterator even for AsyncIterator sources as we still need to wait for the promise to asynchronously resolve.

asynciterator.ts Outdated Show resolved Hide resolved
asynciterator.ts Outdated Show resolved Hide resolved
asynciterator.ts Outdated Show resolved Hide resolved
asynciterator.ts Outdated Show resolved Hide resolved
asynciterator.ts Outdated Show resolved Hide resolved
test/WrappingIterator-test.js Show resolved Hide resolved
@jacoscaz
Copy link
Collaborator Author

jacoscaz commented Apr 14, 2022

Not sure about what's going on with coveralls but this is probably ready for review by @RubenVerborgh (in due time, after #59 is merged). I'll resume working on cloning.

@jacoscaz
Copy link
Collaborator Author

jacoscaz commented Jul 4, 2022

@RubenVerborgh I have just rebased this one on main. I'm not sure as to what is happening with the failing test on Node 10.x but I would prefer if you were to take a look first before doing further work on this.

@RubenVerborgh RubenVerborgh self-assigned this Jul 27, 2022
@RubenVerborgh RubenVerborgh self-requested a review July 27, 2022 21:43
Repository owner deleted a comment from coveralls Jul 28, 2022
asynciterator.ts Outdated Show resolved Hide resolved
@RubenVerborgh
Copy link
Owner

RubenVerborgh commented Jul 28, 2022 via email

@jacoscaz
Copy link
Collaborator Author

Awesome!

asynciterator.ts Outdated Show resolved Hide resolved
@jeswr
Copy link
Collaborator

jeswr commented Jul 28, 2022

Thanks, removed the second option 16fcfb6 and will talk to Jesse over lunch about the other one!

The request for that was in relation to performance when wrapping N3.js.

By default wrap(store.match(...)) would end up using fromAsyncReadable which in turn uses buffering within the WrappingIterator (and all results get buffered into an array as a result of this). As opposed to just using the IterableWrapper which is much faster as it just pulls out quads on-demand. IIRC we cannot change the default prioritization order for backwards compatibility reasons.

The main consequence of removing this is that downstream clients like Comunica will sometimes need to write

 if ((isIterable(source) || isIterator(source)))
    return fromIterable<T>(source);
 else if
    return wrap<T>(source);

rather than

return wrap(source, { prioritizeIterable: true })

asynciterator.ts Outdated Show resolved Hide resolved
asynciterator.ts Outdated
this._source = isAsyncReadable(source) ? source : fromSource(source);
}
catch (error) {
scheduleTask(() => this.emit('error', error));
Copy link
Collaborator

@jeswr jeswr Jul 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be scheduled? (why can't this be called immediately?)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happens during construction; consumer has had no time to attach listener yet.

asynciterator.ts Outdated Show resolved Hide resolved
@RubenVerborgh
Copy link
Owner

Thanks @jeswr; I'll take a look at the backwards compat thing too then. But we should def put performance-friendly options first. (Also this will be one of the last changes before semver.major, so then we can happily break.)

@jacoscaz
Copy link
Collaborator Author

jacoscaz commented Jul 29, 2022

Thanks @jeswr; I'll take a look at the backwards compat thing too then. But we should def put performance-friendly options first. (Also this will be one of the last changes before semver.major, so then we can happily break.)

Thanks @jeswr for the refresher. I hadn't realized we already have a use case for the prioritizeIterable flag. Long-term, I suspect more such flags might follow (e.g. for native async iteration). Perhaps an explicit ordering might be better, here? Something like priority: ['iterable', 'asynciterable', 'array', ...]?

@RubenVerborgh RubenVerborgh removed their assignment Jul 30, 2022
@RubenVerborgh
Copy link
Owner

@jacoscaz @jeswr Done my in-depth review—what do you think?

@jacoscaz
Copy link
Collaborator Author

I like where this ended at! Thanks @RubenVerborgh , +1 from me to merge and I’ll try to rebase #78 post-merge as soon as possible so that it can also be merged, completing work on wrapping.

Copy link
Collaborator

@jeswr jeswr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a deeper review than last time and picked up on some potential bugs / race conditions. Apologies for not identifying them in the last review.

asynciterator.ts Outdated Show resolved Hide resolved
asynciterator.ts Outdated Show resolved Hide resolved
asynciterator.ts Show resolved Hide resolved
asynciterator.ts Outdated Show resolved Hide resolved
@RubenVerborgh RubenVerborgh requested a review from jeswr July 31, 2022 11:28
@RubenVerborgh
Copy link
Owner

RubenVerborgh commented Jul 31, 2022

A major thank you to @jacoscaz as well as @jeswr for your efforts and patience.

I realize it has taken me a lot of time to review and complete this PR, so I definitely wanted to emphasize my appreciation for helping out with open source.

Because of its precise and high-performance nature, AsyncIterator is a special project in the sense that every PR tends to require one uninterrupted day for me to work on it. There's a lot of thinking work on finding which code makes it easiest to reason about correct behavior in a wide array of edge cases. So if you find me replacing or rewriting large chunks of code you have contributed, it's not you—it's me! Basically the hard work is finding the initial code that works, and covering it with an appropriate set of extensive tests, which is what you've done. Then it's my job to re-interpret the code that satisfies that those tests, and that's what tends to take me quite some time to get right.

Thanks for bearing with me, and I look forward to getting to the next PRs!

@RubenVerborgh RubenVerborgh merged commit 0933827 into RubenVerborgh:main Jul 31, 2022
@RubenVerborgh
Copy link
Owner

Published as v3.6.0 🥳

@jacoscaz
Copy link
Collaborator Author

Because of its precise and high-performance nature, AsyncIterator is a special project in the sense that every PR tends to require one uninterrupted day for me to work on it.

@RubenVerborgh thank you for this library in the first place and all the effort that goes into maintaining it! I enjoy your reviews as they always lead to objectively better code from a correctness / performance standpoint. I also enjoy seeing the same code rewritten in a style that is not the one I started out with - there's always a great deal to learn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants