Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large performance improvements and a couple of nice to have new features #96

Merged
merged 3 commits into from
Jul 4, 2023

Conversation

CalamityJames
Copy link
Contributor

@CalamityJames CalamityJames commented Jun 29, 2023

Obsoletes #95 (included in this PR)

Massive thanks to @cr3ative for his stellar work with optimising the scrolling performance, he is the real star of this PR!

Base OS and Python Upgrade

We've moved the base operating system to Alpine, cleaned up the build chain, and simplified the Dockerfile. The resulting image on Balena is about 220MB, about the same as before, but hopefully easier to approach.

This PR has now been tested in Balena and deploys as expected.

Python Profiling

The most computationally expensive thing to do on any computer is render a font. It's what Babbage fought with for years.

I profiled the main loop of this code by vendoring in Luma, removing the threadpool, and running cProfile over it. That revealed two things:

  • Running the seconds as "hotspot" was burning up the CPU
  • The rest of the CPU time was spent "scrolling" the "calling at stations" as it required a full re-render of the string each time a character was dropped off the front of the string.
  • We've changed the scrolling behaviour to scroll the bitmap at 1 pixel per frame, rather than 1 character per frame. Smooth!
    • There's also a snazzy little rising-up animation for fun.
  • The other frequent font calls (in the loop) have all been bitmap cached too.
  • The frame regulator is now configurable; it fights with the CPU on a Pi Zero and is better disabled there; but on a Pi3 you want a regulator to stop burning the CPU up unnecessarily.

So, main fixes were to put the seconds as an interval-updated zone with 0.1 second resolution, and to pre-render all other commonly used TTF operations in the main loop.

Results

On a Pi Zero (the oldest possible device!), here's the performance on "main" running Raspbian:

353895428_652705433582787_8305305711463697195_n

And here's the performance on "performance", with targetFPS set to 0:

image

Using BalenaOS, the Pi Zero can manage about 33fps when targeting 45fps, at about 70% CPU; this leaves some headroom for the supervisor services.

image

Changelog

We edited the changelog that shouldn't have been edited, sorry! Changes pasted in from changelog:

  • Upgrade: Switch Python version and base OS: Python 3.11 on Alpine
  • New feature: showDepartureNumbers option - Adds 1st / 2nd / 3rd prefix as per UK train departures
  • New feature: firstDepartureBold option - toggle bold of first departure line as this is regional
  • New feature: targetFPS option - configurable FPS regulator (zero to disable)
  • Development UX: fpsTime option - Adjusts how frequently the Effecive FPS is displayed
  • Development UX: headless option - Run using emulated serial port (Useful for optimisation checks)
  • Development UX: Skip NRE attribution sleep in emulation mode
  • Development UX: Simplify Dockerfile slightly in an attempt to be Balena-y
  • Performance: Seconds now render every 0.1 second, rather than a hotspot (reduce CPU)
  • Performance: All "in-loop" TTF font rendering is now cached (reduce CPU)
  • Fix: screen1Platform/screen2Platform being required incorrectly on the env

CalamityJames and others added 2 commits June 29, 2023 10:04
…xt if sticking around) and added 1st/2nd/3rd with config var
# Performance Branch

The most computationally expensive thing to do on any computer is render a font. It's what Babbage fought with for years.

I profiled the main loop of this code by vendoring in Luma, removing the threadpool, and running cProfile over it. That revealed two things:

* Running the seconds as "hotspot" was burning up the CPU
* The rest of the CPU time was spent "scrolling" the "calling at stations" as it required a full re-render of the string each time a character was dropped off the front of the string.
* We've changed the scrolling behaviour to scroll the bitmap at 1 pixel per frame, rather than 1 character per frame. Smooth!
  * There's also a snazzy little rising-up animation for fun.
* The other frequent font calls (in the loop) have all been bitmap cached too.
* The frame regulator is now configurable; it fights with the CPU on a Pi Zero and is better disabled there; but on a Pi3 you want a regulator to stop burning the CPU up unnecessarily.

So, main fixes were to put the seconds as an interval-updated zone with 0.1 second resolution, and to pre-render all other commonly used TTF operations in the main loop.

## Results

On a Pi Zero (the oldest possible device!), here's the performance on "main":

![353895428_652705433582787_8305305711463697195_n](https://github.com/CalamityJames/train-departure-display/assets/1850718/823cfcc8-1f6b-4730-ae5d-f49e655af10f)

And here's the performance on "performance", with `targetFPS` set to `0`:

![image](https://github.com/CalamityJames/train-departure-display/assets/1850718/c1a260a1-cd26-4872-b204-4654293caa9a)

# Changelog

See updated CHANGELOG

---------

Co-authored-by: James <[email protected]>
@CalamityJames
Copy link
Contributor Author

I don't have two screens so couldn't test the multi-screen performance. Sorry if that totally breaks it! Will be ordering a second screen but my previous ebay supplier is now away till August!

@cr3ative
Copy link
Contributor

cr3ative commented Jun 29, 2023

Just a note here to say that I reformatted my Pi Zero and installed Python 3.11 (rather than 3.7 which the image comes with), which has many CPU optimisations, and the result is significant:

image

95fps on a Pi Zero is a big leap!

It looks like the image here: https://hub.docker.com/layers/balenalib/raspberry-pi-debian-python/3.11-buster-run/images/sha256-d74b72c912b9f0d019308d0995e50c82b54c106466656d094ebdd30d831e72f7?context=explore

Could be used, but I haven't run this project through Balena yet, so I'm not going to fiddle with Balena-specific settings.

* zooom zoom
* bleep
* tweak; remove zlib, but libjpeg is required at runtime
* rm emu
* tbh
* rewrite for merge to 0.5.0
@cr3ative
Copy link
Contributor

cr3ative commented Jul 3, 2023

I've proceeded with the move to Python 3.11 and put both the build and the run Docker containers on Alpine.

Also got this all working with Balena, which is frustrating to start with, but quite neato when you get used to it.

Updated the PR to note performance running via BalenaOS is acceptable, but understandably lower.

@chrisys
Copy link
Owner

chrisys commented Jul 4, 2023

Wow guys! I'll take a look at this immediately!

@CalamityJames I've got a spare screen I can send you if you need it for testing purposes - email me your address [email protected]

@CalamityJames
Copy link
Contributor Author

@chrisys thanks for the offer - I've dropped you an email :) I should have a Pi Zero 1 (and 2) with me in the next couple of days too so I can test on more realistic devices than my OP Pi3!

@chrisys
Copy link
Owner

chrisys commented Jul 4, 2023

@CalamityJames @cr3ative I've updated my sign (running on balena) with this and it worked first time and is just beautiful, it's the point I always dreamed we could get to, the scrolling is just 😍

@cr3ative I'm shipping James a display to help with testing but I'm happy to do the same for you too if it helps! I have a couple of spare white ones as I was planning on working on #62 but realistically I'm not going to get to it any time soon.

@chrisys chrisys merged commit 5a19296 into chrisys:main Jul 4, 2023
@cr3ative
Copy link
Contributor

cr3ative commented Jul 4, 2023

@CalamityJames @cr3ative I've updated my sign (running on balena) with this and it worked first time and is just beautiful, it's the point I always dreamed we could get to, the scrolling is just 😍

That's really kind of you to say! I hope we got the Balena bits right - hopefully you could tweak them for us anyway. I'll take a display, especially if it saves me taking the dang headers off it! Will drop you an email.

@chrisys
Copy link
Owner

chrisys commented Jul 4, 2023

Yep as far as the balena side is concerned it all looks great! The resultant container is a bit larger than it was previously (225MB vs 89MB) but I agree it makes the maintenance (and development) easier.

@GOTO-GOSUB
Copy link

GOTO-GOSUB commented Jul 4, 2023

I would just like to say a huge thank you to everyone who contributed to this update. The Pi Zero display under my monitor has gone up from approx 0.9fps easily into the 40's. A truly huge improvement and I'm not seeing the clock sticking as it used to either.

Trains 0 5 0

One silly question though, my TZ is set to the default of "Europe/London" but the clock is an hour behind and making a change to it is not making any difference. Has anyone else encountered this, and if so what have I forgotten to change ?

The departure times are correct, it's the real time clock at the bottom that is an hour behind.

@CalamityJames
Copy link
Contributor Author

One silly question though, my TZ is set to the default of "Europe/London" but the clock is an hour behind and making a change to it is not making any difference. Has anyone else encountered this, and if so what have I forgotten to change ?

Hah, just glanced behind me and noticed mine is also reporting as 16:27 currently! Will have a look and see if it's something we've broken!

@CalamityJames
Copy link
Contributor Author

Fix identified I believe, just testing and will submit a new PR!

@cr3ative
Copy link
Contributor

cr3ative commented Jul 4, 2023

See #97 for Time Zone fix.

@CalamityJames CalamityJames deleted the new-features branch July 27, 2023 00:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants