-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance no entities spike #13
Conversation
…omRaw function calls
update from master
You are a true hero and I cannot thank you enough for this fine and noble work. Not to speak for the community, but our team would be 100% ok with going to paragraph level highlight for now (or even sacrificing significant other features) if it means getting rid of lag, as that essentially makes it fully usable again. |
added Facebook's Zuck Senate hearing 5 hours example to the demo app to stress test edge cases
added try catch block around local storage saving to avoid app crashing if it runs out of memory during save in demo app
removed auto sync for performance on longer files, also added sync btn to restore timestamps
Happy to review it @pietrop 👍 Can you add me to the reviewer? |
Just realised that the long file could be loaded in the demo - my bad - I just downloaded + reuploaded the file to my S3 bucket... 🤦♀ If you notice it's part of a different path in the Storybook, that's why. The Storybook seems laggy and quite delayed in response, but maybe that's just the limitation of having a 5 hour long transcription service. If you pause and select text to jump to the right place, it seems to do it well, but not so much when you edit - it hangs (e.g. GIF above). |
Thanks @pbirsinger, I think the main thing is also deciding what's the max length we aim to support or that is reasonable for this component. For my interview editing use cases I think it's on average 1hour but there might be occasional edge cases where it could go up to 2 hours, very unlikely 5 hours. @emettely yeah, sorry I had not updated the comments after last commit. Some more tweaks in latest commits
In RTE using the 5 hour transcription deleted the rest of the transcript after the 2 hour mark, to delete the next 3 hours. then exported in react transcript editor, as draftjs and digitalpaperedit in
|
added as option to pass in speaker and timecodes labels as attributes to transcript editor with defaults
… but also save state to reflect in the editor
Another thing I am thinking about is, should we make re 🔄 btn, I tried to add an animation, so that if it takes longer to sync, it start spinning, and then it stops when it's done. But it seems that when the alignment is started, it takes up a lot of memory and blocks the process, so you don't see the animation 🤷♂ |
A shame about the animation, I'm sure there's a way.
Not too familiar with animations generally, but I've found that quote here |
re CSS animation, ok, fair, then I guess the question remain, why is it freezing... And yeah, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me mostly - functionally there 👍 The code is generally simplified because entity map is removed. More of a suggestion than actual fix: I think revisiting the getWordsBeforeBlock
might be beneficial - it has comments to describe what it does, but probably could be simplified or renamed with getWordsBeforeParagraphBlock
.
const DEMO_MEDIA_URL = demo.url; | ||
const DEMO_TITLE = demo.title; | ||
const DEMO_TYPE = demo.type | ||
|
||
if(this.state.useLocalStorage && isPresentInLocalStorage(DEMO_MEDIA_URL)){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* when testing the demo, un check `Use local storage` in demo options
This is a minor point, but when selecting a sample video in demo
before I select the local storage
checkbox, the video is automatically loaded and crashes. This might be something you want to uncheck in-code or add a load
button, to allow the user to select all the options before loading the demo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the flow would be, untick user local storage
then click load demo
.
Coz I have had people using the transcript editor page, in conjunction with autoEdit to correct transcripts, and didn't want to disable the local storage default auto save, just in case they were still using it.
But yeah, it's not ideal UX, probl untick as default would be better for development?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah seems sensible to untick by default
showTimecodes: typeof this.props.showTimecodes ==='boolean'? this.props.showTimecodes: true, | ||
showSpeakers: typeof this.props.showSpeakers ==='boolean'? this.props.showSpeakers: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a hack - aren't they normally a boolean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, so basically if you do
showSpeakers: this.props.showSpeakers ? this.props.showSpeakers: true,
- if
this.props.showSpeakers
it isundefined
it becomestrue
- but if
this.props.showSpeakers
isfalse
it also becomestrue
So I needed a way to check that it's not undefined, but without overriding the param, if that make, sense, so I thought type checking could be a fix?
showSpeakers: typeof this.props.showSpeakers ==='boolean'? this.props.showSpeakers: true,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe the right thing is make sure that in the parent props, it's always defined so you don't have to type check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you do that if it’s an optional parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like your default option is true
so I would've just set that in the parent
Happy to beta-test this with our team whenever things are stable |
What are the remaining todos here? Really looking forward to these improvements |
@pbirsinger I think it needs some testing and QA, let us know if you'd have time to pull the branch locally and go through some of the QA steps? As a way to ensure we are not introducing new bugs since it's a pretty major change/refactor etc.. You could also probl use the demo in github pages on my fork if that helps saving time, forking, cloning, etc.. |
Pulled in the branch to try out and this definitely does wonders for the lag! Did an informal run through and things mostly seem to be working. However, inside Note I changed the first word "There" to Thereyyyy" and the change was only reflected in the text in blocks but not in data words. Is this intentional? |
Thanks for looking into it @pbirsinger and nice catch! Yeah, I think this is because One of the problem is that running the alignment has a performance cost and if you were to trigger it on every auto save it could slow everything down, for longer files, unless it is optimized in some other way. I haven't been using eg in the handleAutoSaveChanges = newAutoSaveData => {
- const { data, ext } = newAutoSaveData;
+ const { ext } = newAutoSaveData;
+const { data } = this.transcriptEditorRef.current.getEditorContent( ext );
this.setState({ autoSaveData: data, autoSaveExtension: ext });
// Saving to local storage
if(this.state.useLocalStorage){
localSave(this.state.mediaUrl, this.state.fileName, data);
}
}; Because haven't tested the code snippet above, more of a proof of concept Let me know what you think |
@pietrop an approach like that makes sense to me. Tested out those changes but got : So think that snippet close to working but might need some more tweaks. Thanks!! |
As a heads up, this is currently a major blocker for our team, along with other bugs that cause the editor to frequently crash. I think we're going to rebuild our own from scratch in the latest react, typescript, not laggy, etc since we can't really afford to keep waiting on this one unfortunately. |
@pbirsinger whatever works for you and your team. You are always welcome to fork this repo, and make an alternative
It be good to log those and raise issues, so that they can be addressed. |
@pbirsinger if you're going to rebuild from scratch it would be great if you could open source it, or at least keep in touch with regard to the approach you will try; maybe we could improve this one later based on what you learn or provide feedback or be a sounding board on issues you may encounter. |
Sure guys I'm going to experiment and will keep you posted |
they where showing up as NAN, eg if a transcript time started at zero, edge case
👋 I made an alternative version using SlateJs Also see
I am closing this PR as I no longer have a need for it, but fell free to use any of it if useful. |
Is your Pull Request request related to another issue in this repository ?
Fixes this bbc#150 and addresses comment bbc#150 (comment)
Describe what the PR does
TL:DR: Seeing if removing entities can remove the performance issue. (spoiler alert, it seems like it does 🎉 )
At first glance removing entities means that we'd loose the correspondence between words and timecodes. However thanks to
stt-align-node
(original algo by @chrisbaume and integration with@bbc/react-transcript-editor
by @murezzda) it seems like we have an inexpensive way to restore alignment between plain text and timecoded words.This PR Removes word level entities. But keeps the
data
attribute at block paragraph level, that contains thewords
list for each paragraph in the draftJS json.It introduce paragraph level highlight (vs previous word level) by adding previous timecodes of words in previous blocks in paragraph html data attribute, to maintain css injection similar to how current and previous words where displayed before.
I tested the performance by selecting current text and replacing with 2h42min.txt from bbc#150 (comment)
But you can test with something over one hour eg with Facebook CEO Mark Zuckerberg FULL testimony before U.S. senate ~ 5 hours (eg using
electron-video-downloader
to download the video),this is the json of the transcription in
digitalpaperedit
format Facebook CEO Mark Zuckerberg FULL testimony before U.S. senate-pXq-5L2ghhg.mp4.dpe.json.txt (remove.txt
from the file extension after downloading to get the.json
- github doesn't allow to upload json files 🤷♂ )or this is the json of the transcript in
draftjs
formatFacebook CEO Mark Zuckerberg FULL testimony before U.S. senate-pXq-5L2ghhg.mp4.draftjs.json.txt
Either with the whole 5 hours or a trim of 2 hours.
There might be further optimizations that can be done, like not realigning as often. eg only realigning when exporting or saving content.
State whether the PR is ready for review or whether it needs extra work
Looking for a review
I'd encourage to clone and stress test locally, but I've also deployed this version on the github pages of my fork to remove some friction etc..
Additional context
generateConfidence
inWord
as not useful/neededWord
all togethergeneratePreviousTimes
to paragraphs, to do paragraph highlightsadd sync 🔄 refresh button to resync transcript manually ?Issues
Cursor position during delete
Cursor position during enter