You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
BTW, I don't recommend simply copying the same sample multiple times for an evaluation. It can cause performance that looks too good compared to running in production. While the data won't be cached, the same chunks of external language models will get hit multiple times, giving better-than-reality results, as one example. What that means is that, for example, the whisper models are never diverging across elements in the batch in the sequence they are producing. This can cause the embedding lookup to be better than it really should be.
There is no guide on how to execute calc_rtf.py. For example, this one https://github.com/huggingface/open_asr_leaderboard/blob/main/transformers/calc_rtf.py references 4469669.mp3. But there is no such file in the repo from what I see.
So the results are not reproducible.
Same for https://github.com/huggingface/open_asr_leaderboard/blob/main/nemo_asr/calc_rtf.py What is /disk3/datasets/speech-datasets/earnings22/media/4469669.wav?
BTW, I don't recommend simply copying the same sample multiple times for an evaluation. It can cause performance that looks too good compared to running in production. While the data won't be cached, the same chunks of external language models will get hit multiple times, giving better-than-reality results, as one example. What that means is that, for example, the whisper models are never diverging across elements in the batch in the sequence they are producing. This can cause the embedding lookup to be better than it really should be.
I got my RTFx results in https://arxiv.org/abs/2311.04996 by cahcing the entire dataset in memory https://github.com/nvidia-riva/riva-asrlib-decoder/blob/8282368816552a7ee22c9340dce7b9c3c8d1f193/src/riva/asrlib/decoder/test_graph_construction.py#L77-L89 This is what we do at MLPerf Inference benchmarks as well. Which is the gold standard for benchmarking.
The text was updated successfully, but these errors were encountered: