ReplayGain ffmpeg with true peaks is SLOW #4935
Replies: 3 comments 2 replies
-
I tested performance of beets vs other replaygain tools and ResultsMultiplier is the column to look at. Each of those rows are comparable with all others. A 1 in this column means real time (60 minutes of audio takes 60 minutes to analyze). A 6 means 6x (60 minutes of audio = 10 minutes to analyze), and so on. Higher is better. The runtime results are only comparable across the same test suite (below). RG version is about testing whether I'm doing "replaygain" or I'm doing EBRU with true peak analysis.
For completeness (testing failed hypotheses), here are other irrelevant results:
Test suiteMy sampling frame was my personal digital music collection - biased in favor of genres I like (new age, jazz) and formats I like (16 bit CD FLAC). I told foobar to randomize the tracks, then picked the top 14 tracks and copied their discs to my test folder. I only copied discs and not the larger compilations that they came from.
Runtime: 14:37:40.203 I also made a quick test which is a subset of the above. I use this on ARM truepeaks because my system is so slow it would take hours to do the full test.
fb2k windows (baseline)For a baseline (high water mark for performance) I ran replaygain on my high end gaming PC (Ryzen 5 7600X; 4.7 GHz; 6 cores, 12 threads, no overclock). The files were stored on an upper-mid SSD - Western Digital WD BLACK SN750 NVMe M.2 2280 1TB. NB: no observed difference in running on USB3.0 HDD (>100MBps read speeds) and an SSD. No observed difference in having firefox with ~20 tabs opened while the test was running. FB2k might be using ebur128 (but no true peak?). https://hydrogenaud.io/index.php/topic,120918.0.html ffmpeg ARM linux (baseline)This is the same system I run beets on.
The basic test I ginned up in 20 minutes doesn't do per-album tags. So this is again an upper bound on reasonable performance on this system. But it still shows the beets replaygain plugin is slow. Even not running in parallel*, performance is on the order of 60x.
* The command is basically a for-loop run in sequence (not parallel). Within the ffmpeg command it should (and does, based on cpu load) use all CPU cores. NB: Not all tracks were scanned without errors. "Incorrect BOM value", "error reading comment frame, skipped". These were on MP3s and I believe from the logs that the tracks were still scanned for RG. NB: LSIO beets docker uses alpine which uses busybox. your find command must be tailored to it. real 13m55.477s beets ffmpeg linuxPerf was atrocious. The command executed is something like
Album2 - Deerhoof - Mountain Moves 39:42 completed in 4:37 (with beet -vvv so more precise as to when RG stopped and started). ARM linux ffmpeg EBUR128 (same as beets)
Again no per-album tags. 9.6x Looks like it's caused by EBUR128 vs. ReplaygainIt's true peak modeRunning the raw FFMPEG command produced by beets showed the same 8x speed. So the problem appears to be that command , and likely, its EBUR128-ness. Which feels a little odd to me because EBUR128 is supposed to be faster than RG 1.0? https://www.wolframalpha.com/input?i=%283min7s%29%2F23.3+seconds http://underpop.online.fr/f/ffmpeg/help/ebur128.htm.gz Where is true peak mode coming from?I looked through the code, my config, config_default, etc. but I can't find where peak=true is being enabled. I wonder if some other value is being coerced into True with python magic? And then a quirk of ffmpeg where "true" peak has nothing to do with a Truthy or 1 value but ends up that way.
Future workThe difference between ffmpeg linux and the beets ffmpeg replaygain backend is very sharp. I wonder why?
Further reading
TLDRneed more logging on beets replaygain to see where/if there is a serious performance issue. Need more logging around ebur128 to find how true is enabled in ffmpeg backend. |
Beta Was this translation helpful? Give feedback.
-
Some random thoughts:
At least for the ffmpeg backend, per-album replaygain shouldn't add any noticeable processing time, since the album gain is computed from the per-task data using a simple calculation.
Actually, I'm not sure there's really a problem here. If I understand correctly what you're benchmarking, you're really comparing results on different machines, so it's really hard to conclude anything from the results. Some results on my laptop (analyzing one album with 15 tracks:
I think the first thing to verify here is whether your observations are reproducible using Linux on another computer (such as your gaming PC). If not (which I suspect given the numbers from my own testing), this seems like an issue with ffmpeg on ARM (inefficient implementation, specifics of the machine, or maybe just the way it's built).
Doesn't seem to be the case in comparison to
Not on x86, but possibly on ARM? Or maybe just on this specific ARM chip (for example, this could depend on cache sizes and available instruction set extensions. For example, I could imagine that oversampling for true peak analysis requires larger FFT window sizes, which might exceed some cache size on one machine but not the other).
No (it's not writing that much in fact, and also, disabling logging doesn't change timings).
Easy to verify --- only test using bare ffmpeg and see whether the issue persists (as I think you've already done with the "linux ffmpeg EBUR128 (same as beets)" case).
Again, benchmark without beets as for the previous point. Also, even though our database usage is inefficient, it's not that bad for inserting or updating a few items.
Maybe: Just sequential reads after all, but I could imagine that this affects FLAC on HDD. Unlikely for MP3 or on SSD. Should however affect
Maybe?
But you don't observe such differences, do you?
Not sure I understand the point here. If you aleady observe slowness using the null output, it can only be worse for compressed output? Also, in beets, RG analysis is decoupled from anything the |
Beta Was this translation helpful? Give feedback.
-
If you want some more data points you also have https://github.com/sdroege/ebur128 that can be utilized via complexlogic/rsgain#61 or likely any other utility that supports libebur128 |
Beta Was this translation helpful? Give feedback.
-
Here's what I could find about replaygain implementations.
Parallel: multi core reading (presumably a multicore implementation is faster than a non multicore, although it's theoretically possible this is false if a single core implementation is significantly more efficient)
R128: Newer normalization algorithm believed to be better/more accurate than replaygain 1.0. In practice it seems the difference is negligible; they only have different baseline loudnesses. But as far as I can tell, you want to use one standard for your entire library without mixing them.
Replaygain works in beets by having different backends extend the base class.
https://github.com/beetbox/beets/blob/e10b955a931e4c205b0cadf0860797c0aeee736c/beetsplug/replaygain.py
R128 support appears to be limited to a whitelist, which only Opus is included in. Does anyone know why this is? Hints here. Is it about writing different tags? Can you analyze in R128 and write to standard RG tags? Should you?
There are other attempted implementations of Replaygain that never made it into beets - there are PRs and discussions about it. #3368 #1203 #3381
FFmpeg looks like the clear winner right now, but maybe other backends will be superior. Anecdotally it's a bit slow on my ARM device, maybe 10-30 seconds per track. It does max out my CPU cores. R128 seems like the way of the future and I'd hate to start with RG1 and have to re-analyze all my tracks.
Further reading
really good info here
https://wiki.hydrogenaud.io/index.php/ReplayGain
https://hydrogenaud.io/index.php/topic,114494.0.html
3965858
Beta Was this translation helpful? Give feedback.
All reactions