[Audio] MP3 resampling is incorrect when dataset's audio files have different sampling rates #3662

lhoestq · 2022-02-01T17:55:04Z

The Audio feature resampler for MP3 gets stuck with the first original frequencies it meets, which leads to subsequent decoding to be incorrect.

Here is a code to reproduce the issue:

Let's first consider two audio files with different sampling rates 32000 and 16000:

# first download a mp3 file with sampling_rate=32000
!wget https://file-examples-com.github.io/uploads/2017/11/file_example_MP3_700KB.mp3

import torchaudio

audio_path = "file_example_MP3_700KB.mp3"
audio_path2 = audio_path.replace(".mp3", "_resampled.mp3")
resample = torchaudio.transforms.Resample(32000, 16000)  # create a new file with sampling_rate=16000
torchaudio.save(audio_path2, resample(torchaudio.load(audio_path)[0]), 16000)

Then we can see an issue here when decoding:

from datasets import Dataset, Audio

dataset = Dataset.from_dict({"audio": [audio_path, audio_path2]}).cast_column("audio", Audio(48000))
dataset[0]  # decode the first audio file sets the resampler orig_freq to 32000
print(dataset .features["audio"]._resampler.orig_freq)
# 32000
print(dataset[0]["audio"]["array"].shape)  # here decoding is fine
# (1308096,)

dataset = Dataset.from_dict({"audio": [audio_path, audio_path2]}).cast_column("audio", Audio(48000))
dataset[1]  # decode the second audio file sets the resampler orig_freq to 16000
print(dataset .features["audio"]._resampler.orig_freq)
# 16000
print(dataset[0]["audio"]["array"].shape)  # here decoding uses orig_freq=16000 instead of 32000
# (2616192,)

The value of orig_freq doesn't change no matter what file needs to be decoded

cc @patrickvonplaten @anton-l @cahya-wirawan @albertvillanova

The issue seems to be here in Audio.decode_mp3:

datasets/src/datasets/features/audio.py

Lines 176 to 180 in 4c417d5

    
           if self.sampling_rate and self.sampling_rate != sampling_rate: 
        
               if not hasattr(self, "_resampler"): 
        
                   self._resampler = T.Resample(sampling_rate, self.sampling_rate) 
        
               array = self._resampler(array) 
        
               sampling_rate = self.sampling_rate

The text was updated successfully, but these errors were encountered:

cahya-wirawan · 2022-02-01T20:53:08Z

Thanks @lhoestq for finding the reason of incorrect resampling. This issue affects all languages which have sound files with different sampling rates such as Turkish and Luganda.

patrickvonplaten · 2022-02-01T21:19:17Z

@cahya-wirawan - do you know how many languages have different sampling rates in Common Voice? I'm quite surprised to see this for multiple languages actually

patrickvonplaten · 2022-02-01T21:41:17Z

@cahya-wirawan, I can reproduce the problem for Common Voice 7 for Turkish. Here a script you can use:

#!/usr/bin/env python3
from datasets import load_dataset
import torchaudio
from io import BytesIO
from datasets import Audio
from collections import Counter
import sys

ds_name = str(sys.argv[1])
lang = str(sys.argv[2])

ds = load_dataset(ds_name, lang, split="train", use_auth_token=True)
ds = ds.cast_column("audio", Audio(decode=False))

all_sampling_rates = []


def print_sampling_rate(x):
    x, sr = torchaudio.load(BytesIO(x["audio"]["bytes"]), format="mp3")
    all_sampling_rates.append(sr)

ds.map(print_sampling_rate)


print(Counter(all_sampling_rates))

can be run with:

python run.py mozilla-foundation/common_voice_7_0 tr

For CV 6.1 all samples seem to have the same audio

patrickvonplaten · 2022-02-01T22:01:23Z

It actually shows that many more samples are in 32kHz format than it 48kHz which is unexpected. Thanks a lot for flagging! Will contact Common Voice about this as well

cahya-wirawan · 2022-02-01T22:04:47Z

I only checked the CV 7.0 for Turkish, Luganda and Indonesian, they have audio files with difference sampling rates, and all of them are affected by this issue. Percentage of incorrect resampling as follow, Turkish: 9.1%, Luganda: 88.2% and Indonesian: 64.1%.
I checked it using the original CV files. I check the original sampling rates and the length of audio array of each files and compare it with the length of audio array (and the sampling rate which is always 48kHz) from mozilla-foundation/common_voice_7_0 datasets. if the length of audio array from dataset is not equal to 48kHz/original sampling rate * length of audio array of the original audio file then it is affected,

patrickvonplaten · 2022-02-01T22:07:49Z

Ok wow, thanks a lot for checking this - you've found a pretty big bug 😅 It seems like a lot more datasets are actually affected than I original thought. We'll try to solve this as soon as possible and make an announcement tomorrow.

cahya-wirawan mentioned this issue Feb 1, 2022

[Audio] Path of Common Voice cannot be used for audio loading anymore #3663

Closed

lhoestq mentioned this issue Feb 2, 2022

Fix MP3 resampling when a dataset's audio files have different sampling rates #3665

Merged

lhoestq closed this as completed in #3665 Feb 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Audio] MP3 resampling is incorrect when dataset's audio files have different sampling rates #3662

[Audio] MP3 resampling is incorrect when dataset's audio files have different sampling rates #3662

lhoestq commented Feb 1, 2022 •

edited

Loading

cahya-wirawan commented Feb 1, 2022

patrickvonplaten commented Feb 1, 2022

patrickvonplaten commented Feb 1, 2022 •

edited

Loading

patrickvonplaten commented Feb 1, 2022

cahya-wirawan commented Feb 1, 2022

patrickvonplaten commented Feb 1, 2022

[Audio] MP3 resampling is incorrect when dataset's audio files have different sampling rates #3662

[Audio] MP3 resampling is incorrect when dataset's audio files have different sampling rates #3662

Comments

lhoestq commented Feb 1, 2022 • edited Loading

cahya-wirawan commented Feb 1, 2022

patrickvonplaten commented Feb 1, 2022

patrickvonplaten commented Feb 1, 2022 • edited Loading

patrickvonplaten commented Feb 1, 2022

cahya-wirawan commented Feb 1, 2022

patrickvonplaten commented Feb 1, 2022

lhoestq commented Feb 1, 2022 •

edited

Loading

patrickvonplaten commented Feb 1, 2022 •

edited

Loading