Improve performance of json-type by removing deep-clones of spec/data #823

puehringer · 2022-10-20T07:46:35Z

When inlining long arrays (>10k objects) as values in a spec via the type: json, zooming and panning becomes painfully slow. We identified the key reason for this to be the deep-cloning/structuredClone of the spec which happens in several places. See this (draft) PR for the places where we removed the deep-cloning, without any breaking tests/demos: datavisyn#1

We know that the main point of using higlass is to avoid plotting that many points, however the performance improvements are so drastic that refactoring towards less deep-cloning operations should still be considered.

It should also be noted that the sampleLength property does not have any influence on this, because the bottleneck really is the cloning of the entire values array, causing gosling to be slow even with sampleLength: 100.

Slow demo (v0.9.25)

Screencast.from.2022-10-20.09.53.14.webm

(ignore the title, it's the slow version)

Fast demo (without the deep-cloning, using the built version of datavisyn#1)

Screencast.from.2022-10-20.09.50.14.webm

Open questions:

see the ones in perf: remove cloning/stringify of tracks datavisyn/gosling.js#1

The text was updated successfully, but these errors were encountered:

sehilyi · 2022-10-29T23:25:15Z

Thank you @puehringer for exploring this! The performance improvement is really interesting to see.

The only reason I was using the deep copy frequently was to safely keep the original (or intermediate) specs. For example, this.specComplete copies the original spec so that any values added to this later (e.g., missing values like domain) are not added to the original spec.

gosling.js/src/core/gosling-track-model.ts

Line 84 in 7edb6c0

this.specComplete = JSON.parse(JSON.stringify(spec));

gosling.js/src/core/gosling-track-model.ts

Line 97 in 7edb6c0

this.generateCompleteSpec(this.specComplete);

Similarly, whenever the Gosling API exposes the spec to users, I am deep cloning before sending it to users so that whenever users edit the spec, it does not affect to the spec used inside the GoslingComponent.

publish(eventType, {
    id: context.viewUid,
    spec: structuredClone(this.options.spec), // instead of spec: `spec: this.options.spec,`
    shape: { x, y, width, height }
});

...
const data = gosRef.current.api.getTrackInfo('id');
data.spec.tracks = []; // does not affect the spec inside GoslingComponent.

But, I did not consider large JSON values sufficiently at the time of implementing this, and perhaps, some cloning can be removed.

Ultimately, if we want to consider large JSON values, we could keep the values outside of the spec after compiling the spec (e.g., removing values fields from the spec and keeping it in the gosling–track-model separately) so that cloning spec afterward does not decrease the performance.

sehilyi · 2022-10-30T01:04:04Z

(Partly related to #508)

puehringer · 2023-10-27T08:37:41Z

@sehilyi I worked on a more concrete example (i.e. with inlined JSON) which can also be used in your editor. It is a simple alignment vis with inlined data for multiple species. This should help reproducing the issues: json_demo.json. I think this is a pretty good benchmark, as it should be possible to make this visualization super smooth. Let me know if we can help with anything, I think datavisyn#1 is a starting point. I also get your points of passing copies around for immutability purposes, and I agree that it should improve performance drastically if spec and (inline json) data would be kept separate.

And here is a quick and easy way to check what is being cloned (it just overrides JSON.stringify, but there is also structuredClone calls). The demo shows how often it is called when zooming/panning, and including the copying of the (large) original data array.

const orig = JSON.stringify;
JSON.stringify = (...args) => {
    console.log("STRINGIFY", args)
    return orig(...args);
}

stringify.webm

etowahadams · 2023-10-27T16:00:54Z

Hi @puehringer, thanks for the comment and for looking into this more in detail. The json example you provided will be very helpful. I've started to set up performance testing infrastructure and will use this as my first example to optimize.

- Adds playwright - Add playwright test which measures how long it takes to zoom in an example spec (from - Improve performance of json-type by removing deep-clones of spec/data #823 (comment)) - Persist resolved spec as a property of GoslingTrackClass so that it doesn't have to be regenerated every time - Minimize number of deep clones of the spec or the data.

puehringer added the enhancement New feature or request label Oct 20, 2022

puehringer linked a pull request Oct 20, 2022 that will close this issue

perf: remove cloning/stringify of tracks datavisyn/gosling.js#1

Draft

8 tasks

ngehlenborg assigned sehilyi Oct 20, 2022

etowahadams mentioned this issue Nov 7, 2023

feat(track): performance improvements #988

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of json-type by removing deep-clones of spec/data #823

Improve performance of json-type by removing deep-clones of spec/data #823

puehringer commented Oct 20, 2022 •

edited

Loading

sehilyi commented Oct 29, 2022 •

edited

Loading

sehilyi commented Oct 30, 2022

puehringer commented Oct 27, 2023

etowahadams commented Oct 27, 2023

Improve performance of json-type by removing deep-clones of spec/data #823

Improve performance of json-type by removing deep-clones of spec/data #823

Comments

puehringer commented Oct 20, 2022 • edited Loading

Slow demo (v0.9.25)

Fast demo (without the deep-cloning, using the built version of datavisyn#1)

Open questions:

sehilyi commented Oct 29, 2022 • edited Loading

sehilyi commented Oct 30, 2022

puehringer commented Oct 27, 2023

etowahadams commented Oct 27, 2023

puehringer commented Oct 20, 2022 •

edited

Loading

sehilyi commented Oct 29, 2022 •

edited

Loading