Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: avoid base64-encoding image data #76

Merged
merged 11 commits into from
Nov 7, 2023
Merged

Conversation

manzt
Copy link
Contributor

@manzt manzt commented Jul 17, 2023

Implements #75. Uses Jupyter Widgets support for sending binary data directly over websockets.

Copy link
Member

@almarklein almarklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Even faster graphics 😄

I made two minor comments.

Also one question: would this also work in other clients like Colab (#74) or MS Code?

jupyter_rfb/widget.py Show resolved Hide resolved
jupyter_rfb/widget.py Outdated Show resolved Hide resolved
@manzt
Copy link
Contributor Author

manzt commented Jul 17, 2023

Also one question: would this also work in other clients like Colab (#74) or MS Code?

Yes. VS Code and Colab correctly implement the Jupyter Widgets, it's just there are a lot of quirks you need to worry about as a widget author to make sure your front-end code is packaged correctly and discoverable in those environments.

Here's an example notebook using https://github.com/manzt/anywidget – a project I work on that handles those complexities for you – that sends binary data in Colab.

@almarklein
Copy link
Member

One linting error: ./jupyter_rfb/widget.py:17:1: F401 'base64.encodebytes' imported but unused 😉

And yarn seems to fail (on some builds).

@manzt
Copy link
Contributor Author

manzt commented Jul 17, 2023

And yarn seems to fail (on some builds).

yarn builds are failing because you don't have a lock file and one of the dependencies is now incompatible with node v14. I bumped the version in CI, but you might want to consider checking in a lockfile.

@djhoese
Copy link
Member

djhoese commented Jul 17, 2023

I didn't find too much on previous conversations on this topic so let me know if I'm missing something that I can read in a different thread: The related issues talks about JPEG only, but this PR sends "raw" JPEG and PNG, right? Unless I'm mis-remembering, I think base64 was originally used because that's what vispy used to use in the old webgl backend that this package was originally created to support. This PR is basically just saying "we can send binary data, so send binary data", right?

@almarklein
Copy link
Member

The related issues talks about JPEG only, but this PR sends "raw" JPEG and PNG, right?

Yes.

I think base64 was originally used because that's what vispy used to use in the old webgl backend that this package was originally created to support.

I used base64 because its a common way to be able to send binary data over a textual interface. And in the case of images, because it allows creating an url that embeds the image (so no decoding necessary in JS). I've seen/used the same technique in similar situations, and I guess I hadn't thought about using the ability to send binary data over the websocket ...

But now that I think of it ... are we certain that the websocket is really in binary mode, and that Jupyter is not base64 encoding the data anyway (and decoding in the browser). Can we observe an increase in performance with this change (e.g. in performance.ipynb)?

@manzt
Copy link
Contributor Author

manzt commented Jul 17, 2023

But now that I think of it ... are we certain that the websocket is really in binary mode, and that Jupyter is not base64 encoding the data anyway (and decoding in the browser).

Fairly certain it sends the binary data and there is no hidden base64 encoding. cc: @maartenbreddels

Also not familiar with being able to turn off/on binary support for a WebSocket. WebSocket.binaryType property controls the type of binary data being received over the connection, but the only options are blob and arrayBuffer.

https://github.com/jupyterlab/jupyterlab/blob/c68fa8e2df6d15d6329d6337ac30f520e371a620/packages/services/src/kernel/default.ts#L1276

js/lib/widget.js Outdated Show resolved Hide resolved
@almarklein
Copy link
Member

I picked this up, eventually making this work with both approaches.

I ran this many times on the examples/performance.ipynb. Now this may not be a complete benchmark, and I only tested on localhost and MacOS .... but ... what I see is that using a websocket is actually a bit slower. I was not expecting that, and its not a lot, but its significant.

My guess is that the object-url's cause unexpected overhead or something. I also saw that the time spent on base64 encoding is much smaller (like an order of magnitude) than the jpeg compression.

Leaving this open for now in case someone else wants to have a stab.

@djhoese
Copy link
Member

djhoese commented Oct 26, 2023

I also saw that the time spent on base64 encoding is much smaller (like an order of magnitude) than the jpeg compression.

This is a function of the size of the frame/image, right? The performance notebook is 480x640 images. I'd also assume this is heavily dependent on the load of the server. So I guess my questions are:

  1. How big of a frame do you have to make for base64 to be on-par with JPEG compression?
  2. What is the difference in amount of data sent over the network? Obviously locally isn't too big of a deal, but over a non-local network the UX could be different.

@almarklein
Copy link
Member

Good questions! Also worth looking at how things work out with png. I'll have another look soon.

@almarklein
Copy link
Member

what I see is that using a websocket is actually a bit slower. I was not expecting that, and its not a lot, but its significant.

Sorry, my above comment was based on very unrealistic tests.

Did a bit more testing. The size of the frames does not seem to matter much. What does matter a lot is when it sends random data instead. Then the websocket approach becomes twice as fast 😄. Although fully random data is not realistic, a nearly black image is neither, most real data will be somewhere in between.

I'm now confident that in general the websockt approach is faster, especially in cases where the server is remote. I've left both options open, so in case one is in a situation where a websocket is problematic, it can still be turned off. By default it's on though.

Ready to go from my end.

@almarklein almarklein merged commit 94bf3e5 into vispy:main Nov 7, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants