Performance no entities spike #13

pietrop · 2020-03-12T18:32:30Z

Is your Pull Request request related to another issue in this repository ?

Fixes this bbc#150 and addresses comment bbc#150 (comment)

Describe what the PR does

TL:DR: Seeing if removing entities can remove the performance issue. (spoiler alert, it seems like it does 🎉 )

At first glance removing entities means that we'd loose the correspondence between words and timecodes. However thanks to stt-align-node (original algo by @chrisbaume and integration with @bbc/react-transcript-editor by @murezzda) it seems like we have an inexpensive way to restore alignment between plain text and timecoded words.

This PR Removes word level entities. But keeps the data attribute at block paragraph level, that contains the words list for each paragraph in the draftJS json.

It introduce paragraph level highlight (vs previous word level) by adding previous timecodes of words in previous blocks in paragraph html data attribute, to maintain css injection similar to how current and previous words where displayed before.

I tested the performance by selecting current text and replacing with 2h42min.txt from bbc#150 (comment)

But you can test with something over one hour eg with Facebook CEO Mark Zuckerberg FULL testimony before U.S. senate ~ 5 hours (eg using electron-video-downloader to download the video),

this is the json of the transcription in digitalpaperedit format Facebook CEO Mark Zuckerberg FULL testimony before U.S. senate-pXq-5L2ghhg.mp4.dpe.json.txt (remove .txt from the file extension after downloading to get the .json - github doesn't allow to upload json files 🤷‍♂ )

or this is the json of the transcript in draftjs format
Facebook CEO Mark Zuckerberg FULL testimony before U.S. senate-pXq-5L2ghhg.mp4.draftjs.json.txt

Either with the whole 5 hours or a trim of 2 hours.

There might be further optimizations that can be done, like not realigning as often. eg only realigning when exporting or saving content.

State whether the PR is ready for review or whether it needs extra work

Looking for a review

I'd encourage to clone and stress test locally, but I've also deployed this version on the github pages of my fork to remove some friction etc..

Additional context

removed entities, commented out the function that creates entities. Lukily it was a shared function across all adapters.
removed generateConfidence in Word as not useful/needed
removed Word all together
added generatePreviousTimes to paragraphs, to do paragraph highlights
Removed timecode indications at paragraph level. in favour of this fuzzy double click on word jump to media
~~add sync 🔄 refresh button to resync transcript manually ?~~
seems like local storage auto save could work, after removing entities, . Tested in demo after fixing it, with 2h40min worth of text, and it saved and retrieved fine. Still of the opinion to keep it in demo app as example, and leave it to individual implementations to decide whether they'd want to add it etc..

Issues

Some issues around cursor position jumping, during various edge cases.

Cursor position during delete

…omRaw function calls

update from master

pbirsinger · 2020-03-15T02:52:38Z

You are a true hero and I cannot thank you enough for this fine and noble work.

Not to speak for the community, but our team would be 100% ok with going to paragraph level highlight for now (or even sacrificing significant other features) if it means getting rid of lag, as that essentially makes it fully usable again.

added Facebook's Zuck Senate hearing 5 hours example to the demo app to stress test edge cases

added try catch block around local storage saving to avoid app crashing if it runs out of memory during save in demo app

removed auto sync for performance on longer files, also added sync btn to restore timestamps

emettely · 2020-03-16T10:16:31Z

Happy to review it @pietrop 👍 Can you add me to the reviewer?

emettely · 2020-03-16T12:01:50Z

Trying it out in storybook - see GIF below, it hangs 😢

Just realised that the long file could be loaded in the demo - my bad - I just downloaded + reuploaded the file to my S3 bucket... 🤦‍♀ If you notice it's part of a different path in the Storybook, that's why.

The Storybook seems laggy and quite delayed in response, but maybe that's just the limitation of having a 5 hour long transcription service. If you pause and select text to jump to the right place, it seems to do it well, but not so much when you edit - it hangs (e.g. GIF above).

pietrop · 2020-03-16T14:24:29Z

added as option to pass in speaker and timecodes labels as attributes to transcript editor with defaults

… but also save state to reflect in the editor

pietrop · 2020-03-16T15:03:42Z

Another thing I am thinking about is, should we make handleAutoSaveChanges optional? would that improve perfromance?

re 🔄 btn, I tried to add an animation, so that if it takes longer to sync, it start spinning, and then it stops when it's done. But it seems that when the alignment is started, it takes up a lot of memory and blocks the process, so you don't see the animation 🤷‍♂
Thinking maybe offloading that computation to a service worker? (if compatible with electron etc..) or whether there's a way to address it/what's the root cause etc..

emettely · 2020-03-16T15:13:27Z

handleAutoSaveChanges() should probably be an option - by default on, but possible to set as off. Is there a way to cleverly do this - e.g. examining the length / size of transcription?

A shame about the animation, I'm sure there's a way.

CSS animations are handled by the browser’s compositor thread rather than the main thread responsible for painting and styling. Consequently, such animations are unaffected by the main thread’s more expensive tasks.

Not too familiar with animations generally, but I've found that quote here

pietrop · 2020-03-16T15:23:21Z

re CSS animation, ok, fair, then I guess the question remain, why is it freezing...

And yeah, handleAutoSaveChanges could have an on/off at interface level etc.. something look into

emettely

Looks good to me mostly - functionally there 👍 The code is generally simplified because entity map is removed. More of a suggestion than actual fix: I think revisiting the getWordsBeforeBlock might be beneficial - it has comments to describe what it does, but probably could be simplified or renamed with getWordsBeforeParagraphBlock.

emettely · 2020-03-16T17:31:16Z

demo/app.js

+    const DEMO_MEDIA_URL = demo.url;
+    const DEMO_TITLE = demo.title;
+    const DEMO_TYPE = demo.type
+
      if(this.state.useLocalStorage && isPresentInLocalStorage(DEMO_MEDIA_URL)){


* when testing the demo, un check `Use local storage` in demo options

This is a minor point, but when selecting a sample video in demo before I select the local storage checkbox, the video is automatically loaded and crashes. This might be something you want to uncheck in-code or add a load button, to allow the user to select all the options before loading the demo.

Yeah the flow would be, untick user local storage then click load demo.
Coz I have had people using the transcript editor page, in conjunction with autoEdit to correct transcripts, and didn't want to disable the local storage default auto save, just in case they were still using it.

But yeah, it's not ideal UX, probl untick as default would be better for development?

yeah seems sensible to untick by default

packages/components/media-player/index.js

packages/components/media-player/src/PlayerControls/index.js

packages/components/timed-text-editor/CustomEditor.js

packages/components/transcript-editor/index.js

emettely · 2020-03-16T18:48:15Z

packages/components/transcript-editor/index.js

+      showTimecodes: typeof this.props.showTimecodes ==='boolean'? this.props.showTimecodes: true,
+      showSpeakers: typeof this.props.showSpeakers ==='boolean'? this.props.showSpeakers: true,


This seems like a hack - aren't they normally a boolean?

yeah, so basically if you do

showSpeakers: this.props.showSpeakers ? this.props.showSpeakers: true,

ifthis.props.showSpeakers it is undefined it becomes true

but if this.props.showSpeakers is false it also becomes true
So I needed a way to check that it's not undefined, but without overriding the param, if that make, sense, so I thought type checking could be a fix?

showSpeakers: typeof this.props.showSpeakers ==='boolean'? this.props.showSpeakers: true,

I think maybe the right thing is make sure that in the parent props, it's always defined so you don't have to type check.

How do you do that if it’s an optional parameter?

Looks like your default option is true so I would've just set that in the parent

packages/components/timed-text-editor/WrapperBlock.module.css

packages/components/timed-text-editor/WrapperBlock.js

pbirsinger · 2020-03-24T15:08:57Z

Happy to beta-test this with our team whenever things are stable

pbirsinger · 2020-04-06T14:58:45Z

What are the remaining todos here? Really looking forward to these improvements

pietrop · 2020-04-06T18:24:42Z

@pbirsinger I think it needs some testing and QA, let us know if you'd have time to pull the branch locally and go through some of the QA steps?

As a way to ensure we are not introducing new bugs since it's a pretty major change/refactor etc..

You could also probl use the demo in github pages on my fork if that helps saving time, forking, cloning, etc..

pbirsinger · 2020-04-10T01:41:33Z

Pulled in the branch to try out and this definitely does wonders for the lag! Did an informal run through and things mostly seem to be working.

However, inside handleAutoSaveChanges after setting isAutoSave to true, I printed out newAutoSaveData.data and noticed a discrepancy between the data words and text in blocks:

Note I changed the first word "There" to Thereyyyy" and the change was only reflected in the text in blocks but not in data words. Is this intentional?

pietrop · 2020-04-10T12:37:56Z

Thanks for looking into it @pbirsinger and nice catch!

Yeah, I think this is because handleAutoSaveChanges does not run updateTimestampsForEditorState on the content before returning it.

One of the problem is that running the alignment has a performance cost and if you were to trigger it on every auto save it could slow everything down, for longer files, unless it is optimized in some other way.

I haven't been using handleAutoSaveChanges, and have been thinking that, the logic that returns the content for that function, could be moved outside of the component. eg the function can still trigger to let you know when there have been changes etc.. but then outside of the component you could decide how to handle that, eg do you run the alignment on the client side or send it to the server/backend to do it etc...

eg in the demo/app.js, handleAutoSaveChanges could be modified like this

  handleAutoSaveChanges = newAutoSaveData => {
- const { data, ext } = newAutoSaveData;
+ const { ext } = newAutoSaveData;
+const { data } = this.transcriptEditorRef.current.getEditorContent( ext );
    this.setState({ autoSaveData: data, autoSaveExtension: ext });
    // Saving to local storage 
    if(this.state.useLocalStorage){
      localSave(this.state.mediaUrl, this.state.fileName, data);
    }
  };

Because getEditorContent runs updateTimestampsForEditorState

haven't tested the code snippet above, more of a proof of concept

Let me know what you think

pbirsinger · 2020-04-12T13:49:54Z

@pietrop an approach like that makes sense to me. Tested out those changes but got :

So think that snippet close to working but might need some more tweaks.

Thanks!!

pbirsinger · 2020-04-16T13:26:24Z

As a heads up, this is currently a major blocker for our team, along with other bugs that cause the editor to frequently crash. I think we're going to rebuild our own from scratch in the latest react, typescript, not laggy, etc since we can't really afford to keep waiting on this one unfortunately.

pietrop · 2020-04-16T14:19:49Z

@pbirsinger whatever works for you and your team. You are always welcome to fork this repo, and make an alternative TimedTextEditor if that makes it easier to try something new. (I did that for some of my projects where I use react-transcript-editor, and then I contribute back when I make a breakthrough on things, but that way I don't have to wait on the maintainers for some of those fixes to be incorporated in new releases etc... )

other bugs that cause the editor to frequently crash

It be good to log those and raise issues, so that they can be addressed.

Laurian · 2020-04-20T16:56:22Z

@pbirsinger if you're going to rebuild from scratch it would be great if you could open source it, or at least keep in touch with regard to the approach you will try; maybe we could improve this one later based on what you learn or provide feedback or be a sounding board on issues you may encounter.

pbirsinger · 2020-04-22T03:12:53Z

Sure guys I'm going to experiment and will keep you posted

they where showing up as NAN, eg if a transcript time started at zero, edge case

pietrop · 2020-04-28T01:08:24Z

👋 I made an alternative version using SlateJs slate-transcript-editor

Also see

Some reasons for going from DraftJs to slateJS
Storybook for demo
My fork of digital paper edit demo for example in context. eg this transcript
Latest release of autoEdit 3/Digital Paper edit, in release section (above version 1.1.0) and section of user manual about transcript editor

I am closing this PR as I no longer have a need for it, but fell free to use any of it if useful.

Pietro Passarelli and others added 11 commits March 12, 2020 12:55

commented out auto save and entities-mto reduce some of the convertFr…

d974cfe

…omRaw function calls

saving progress

e9f62fc

Merge pull request #12 from pietrop/master

77ca865

update from master

saving progress

422391e

remove entitines from stt align

bc0018c

saving progress

09153fb

Added paragraph level highlight

8bdb216

Saved local storage

fed4bd2

clean up

4ebda51

brought back timecodes at speaker level

7234a3b

Removed forced re-render

b0cc657

pietrop mentioned this pull request Mar 13, 2020

Performance hit for media over 1 hour bbc/react-transcript-editor#150

Open

generall fixes and clean up

37dcb1a

pietrop marked this pull request as ready for review March 13, 2020 21:51

pietrop mentioned this pull request Mar 13, 2020

Performance no entities spike bbc/react-transcript-editor#226

Open

Pietro Passarelli added 3 commits March 15, 2020 22:50

added FB 5h demo

6dec5cb

added Facebook's Zuck Senate hearing 5 hours example to the demo app to stress test edge cases

error handling local storage

6e62def

added try catch block around local storage saving to avoid app crashing if it runs out of memory during save in demo app

added a 2 hour example + various twwaks

14d6397

removed auto sync for performance on longer files, also added sync btn to restore timestamps

pietrop assigned pietrop and emettely Mar 16, 2020

Pietro Passarelli added 2 commits March 16, 2020 10:50

show timecodes and speaker labels

2925925

added as option to pass in speaker and timecodes labels as attributes to transcript editor with defaults

exporting also save states-mexporting does an align before exporting,…

08a46e5

… but also save state to reflect in the editor

emettely reviewed Mar 16, 2020

View reviewed changes

pietrop mentioned this pull request Mar 18, 2020

chunking alignement to optimize? #14

Closed

pietrop mentioned this pull request Apr 23, 2020

Media player component in story book is broken bbc/react-transcript-editor#231

Open

fixing Media component in storybook

d05876f

added support for zero seconds

cf8a9f7

they where showing up as NAN, eg if a transcript time started at zero, edge case

pietrop closed this Apr 28, 2020

pietrop reopened this Apr 28, 2020

pietrop closed this Apr 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance no entities spike #13

Performance no entities spike #13

pietrop commented Mar 12, 2020 •

edited

Loading

pbirsinger commented Mar 15, 2020 •

edited

Loading

emettely commented Mar 16, 2020

emettely commented Mar 16, 2020 •

edited

Loading

pietrop commented Mar 16, 2020 •

edited

Loading

pietrop commented Mar 16, 2020

emettely commented Mar 16, 2020 •

edited

Loading

pietrop commented Mar 16, 2020

emettely left a comment

emettely Mar 16, 2020

pietrop Mar 16, 2020

emettely Mar 17, 2020

emettely Mar 16, 2020

pietrop Mar 16, 2020

emettely Mar 17, 2020

pietrop Mar 17, 2020

emettely Mar 17, 2020

pbirsinger commented Mar 24, 2020

pbirsinger commented Apr 6, 2020

pietrop commented Apr 6, 2020 •

edited

Loading

pbirsinger commented Apr 10, 2020 •

edited

Loading

pietrop commented Apr 10, 2020 •

edited

Loading

pbirsinger commented Apr 12, 2020

pbirsinger commented Apr 16, 2020

pietrop commented Apr 16, 2020

Laurian commented Apr 20, 2020

pbirsinger commented Apr 22, 2020

pietrop commented Apr 28, 2020

		showTimecodes: typeof this.props.showTimecodes ==='boolean'? this.props.showTimecodes: true,
		showSpeakers: typeof this.props.showSpeakers ==='boolean'? this.props.showSpeakers: true,

Performance no entities spike #13

Performance no entities spike #13

Conversation

pietrop commented Mar 12, 2020 • edited Loading

pbirsinger commented Mar 15, 2020 • edited Loading

emettely commented Mar 16, 2020

emettely commented Mar 16, 2020 • edited Loading

pietrop commented Mar 16, 2020 • edited Loading

pietrop commented Mar 16, 2020

emettely commented Mar 16, 2020 • edited Loading

pietrop commented Mar 16, 2020

emettely left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pbirsinger commented Mar 24, 2020

pbirsinger commented Apr 6, 2020

pietrop commented Apr 6, 2020 • edited Loading

pbirsinger commented Apr 10, 2020 • edited Loading

pietrop commented Apr 10, 2020 • edited Loading

pbirsinger commented Apr 12, 2020

pbirsinger commented Apr 16, 2020

pietrop commented Apr 16, 2020

Laurian commented Apr 20, 2020

pbirsinger commented Apr 22, 2020

pietrop commented Apr 28, 2020

pietrop commented Mar 12, 2020 •

edited

Loading

pbirsinger commented Mar 15, 2020 •

edited

Loading

emettely commented Mar 16, 2020 •

edited

Loading

pietrop commented Mar 16, 2020 •

edited

Loading

emettely commented Mar 16, 2020 •

edited

Loading

pietrop commented Apr 6, 2020 •

edited

Loading

pbirsinger commented Apr 10, 2020 •

edited

Loading

pietrop commented Apr 10, 2020 •

edited

Loading