Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of json-type by removing deep-clones of spec/data #823

Open
1 task
puehringer opened this issue Oct 20, 2022 · 4 comments · May be fixed by datavisyn/gosling.js#1
Open
1 task

Improve performance of json-type by removing deep-clones of spec/data #823

puehringer opened this issue Oct 20, 2022 · 4 comments · May be fixed by datavisyn/gosling.js#1
Assignees
Labels
enhancement New feature or request

Comments

@puehringer
Copy link

puehringer commented Oct 20, 2022

When inlining long arrays (>10k objects) as values in a spec via the type: json, zooming and panning becomes painfully slow. We identified the key reason for this to be the deep-cloning/structuredClone of the spec which happens in several places. See this (draft) PR for the places where we removed the deep-cloning, without any breaking tests/demos: datavisyn#1

We know that the main point of using higlass is to avoid plotting that many points, however the performance improvements are so drastic that refactoring towards less deep-cloning operations should still be considered.

It should also be noted that the sampleLength property does not have any influence on this, because the bottleneck really is the cloning of the entire values array, causing gosling to be slow even with sampleLength: 100.

Slow demo (v0.9.25)

Screencast.from.2022-10-20.09.53.14.webm

(ignore the title, it's the slow version)

Fast demo (without the deep-cloning, using the built version of datavisyn#1)

Screencast.from.2022-10-20.09.50.14.webm

Open questions:

@puehringer puehringer added the enhancement New feature or request label Oct 20, 2022
@puehringer puehringer linked a pull request Oct 20, 2022 that will close this issue
8 tasks
@sehilyi
Copy link
Member

sehilyi commented Oct 29, 2022

Thank you @puehringer for exploring this! The performance improvement is really interesting to see.

The only reason I was using the deep copy frequently was to safely keep the original (or intermediate) specs. For example, this.specComplete copies the original spec so that any values added to this later (e.g., missing values like domain) are not added to the original spec.

this.specComplete = JSON.parse(JSON.stringify(spec));

this.generateCompleteSpec(this.specComplete);

Similarly, whenever the Gosling API exposes the spec to users, I am deep cloning before sending it to users so that whenever users edit the spec, it does not affect to the spec used inside the GoslingComponent.

publish(eventType, {
    id: context.viewUid,
    spec: structuredClone(this.options.spec), // instead of spec: `spec: this.options.spec,`
    shape: { x, y, width, height }
});

...
const data = gosRef.current.api.getTrackInfo('id');
data.spec.tracks = []; // does not affect the spec inside GoslingComponent.

But, I did not consider large JSON values sufficiently at the time of implementing this, and perhaps, some cloning can be removed.

Ultimately, if we want to consider large JSON values, we could keep the values outside of the spec after compiling the spec (e.g., removing values fields from the spec and keeping it in the gosling–track-model separately) so that cloning spec afterward does not decrease the performance.

@sehilyi
Copy link
Member

sehilyi commented Oct 30, 2022

(Partly related to #508)

@puehringer
Copy link
Author

@sehilyi I worked on a more concrete example (i.e. with inlined JSON) which can also be used in your editor. It is a simple alignment vis with inlined data for multiple species. This should help reproducing the issues: json_demo.json. I think this is a pretty good benchmark, as it should be possible to make this visualization super smooth. Let me know if we can help with anything, I think datavisyn#1 is a starting point. I also get your points of passing copies around for immutability purposes, and I agree that it should improve performance drastically if spec and (inline json) data would be kept separate.

And here is a quick and easy way to check what is being cloned (it just overrides JSON.stringify, but there is also structuredClone calls). The demo shows how often it is called when zooming/panning, and including the copying of the (large) original data array.

const orig = JSON.stringify;
JSON.stringify = (...args) => {
    console.log("STRINGIFY", args)
    return orig(...args);
}
stringify.webm

@etowahadams
Copy link
Contributor

Hi @puehringer, thanks for the comment and for looking into this more in detail. The json example you provided will be very helpful. I've started to set up performance testing infrastructure and will use this as my first example to optimize.

etowahadams added a commit that referenced this issue Dec 4, 2023
- Adds playwright
- Add playwright test which measures how long it takes to zoom in an example spec (from
- Improve performance of json-type by removing deep-clones of spec/data #823 (comment))
- Persist resolved spec as a property of GoslingTrackClass so that it doesn't have to be regenerated every time
- Minimize number of deep clones of the spec or the data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants