-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Audio] Path of Common Voice cannot be used for audio loading anymore #3663
Comments
Having talked to @lhoestq, I see that this feature is no longer supported. I really don't think this was a good idea. It is a major breaking change and one for which we don't even have a working solution at the moment, which is bad for PyTorch as we don't want to force people to have IMO, it's really important to think about a solution here and I strongly favor to make a difference here between loading a dataset in streaming mode and in non-streaming mode, so that in non-streaming mode the actual downloaded file is displayed. It's really crucial for people to be able to analyse the original files IMO when the dataset is not downloaded in streaming mode. There are the following reasons why it is paramount to have access to the original audio file in my opinion (in non-streaming mode):
=> IMO, it's a very big priority to again have the correct absolute path in non-streaming mode. The other solution of providing a path-like object derived from the bytes stocked in the |
Agree that we need to have access to the original sound files. Few days ago I was looking for these original files because I suspected there is bug in the audio resampling (confirmed in #3662) and I want to do my own resampling to workaround the bug, which is now not possible anymore due to the unavailability of the original files. |
Just to clarify, here you describe the approach that uses the
I'd assume this is because we use Your concern is reasonable, but there are situations where we can only serve bytes (see #3685 for instance). IMO it makes sense to fix the affected datasets for now, but I don't think we should care too much whether we rely on local paths or bytes after soundfile adds support for MP3 as long as our examples work (shouldn't be too hard to update the |
Related to this discussion: in #3664 (comment) I propose how we could change |
Yes!
Yes this might be, but I highly doubt that => All this to say that we should definitely care about whether we rely on local paths or bytes IMO. We don't want to loose all users that are forced to use |
Thanks a lot for the very detailed explanation. Now everything makes much more sense. |
From #3736 the Common Voice dataset now gives access to the local audio files as before |
I understand the argument that it is bad to have a breaking change. How to deal with the introduction of breaking changes is a topic of its own and not sure how you want to deal with that (or is the policy this is never allowed, and there must be a Regardless of whether it is a breaking change, however, I don't see the other arguments.
I don't exactly understand this. Why not? Why does the HF dataset on-the-fly decoding mechanism not work? Why is it anyway specific to PyTorch or TensorFlow? Isn't this independent? But even if you just provide the raw bytes to TF, on TF you could just use sth like
I don't really understand the arguments (despite that it maybe breaks existing code). You anyway have the original audio files but it is just embedded in the dataset? I don't really know about any library which cannot also load the audio from memory (i.e. from the dataset). Btw, on librosa being slow for decoding audio files, I saw that as well, so we have this comment RETURNN:
Resampling is also a separate aspect, which is also less straightforward and with different compromises between speed and quality. So there the different tradeoffs and different implementations can make a difference. However, I don't see how this is related to the question whether there should be the raw bytes inside the dataset or as separate local files. |
Thanks for your comments here @albertz - cool to get your input! Answering a bit here between the lines:
The problem with decoding on the fly is that we currently rely on So for TF and Flax it's important that users can load audio files or bytes they way the want to - this might become less important if we find (or make) a good library with few dependencies that is fast for all kinds of platforms / use cases. Now the question is whether it's better to store audio data as a path to a file or as raw bytes I guess.
But the argument that the audio should be loadable directly from memory is good - haven't thought about this too much. def save_as_bytes:
batch["bytes"] = read_in_bytes_from_file(batch["file"])\
os.remove(batch["file"])
ds = ds.map(save_as_bytes)
ds.save_to_disk(...) Guess the question is more a bit about what should be the default case? |
But how is this relevant for this issue here? I thought this issue here is about having the (correct) path in the dataset or having raw bytes in the dataset. How did TF users use it at all then? Or they just do not use on-the-fly decoding? I did not even notice this problem (maybe because I had But as I outlined before, they could just use
I was not really familiar with But ok, now we are just discussing how to handle the on-the-fly decoding. I still think this is a separate issue and having raw bytes in the dataset instead of local files should just be fine as well.
I think nobody who writes code is scared by seeing the raw bytes content of a binary file. :)
In #4184 (comment), you said/proposed that this
Yea this is up to you. I'm happy as long as we can get it the way we want easily and this is a well supported use case. :) |
Yes! Should be super easy now see discussion here: rwth-i6/i6_core#257 (comment) Thanks for the super useful input :-) |
Despite the comments that this has been fixed, I am finding the exact same problem is occurring again (with datasets version 2.3.2) |
It appears downgrading to torchaudio 0.11.0 fixed this problem. |
@DCNemesis, sorry which problem exactly is occuring again? Also cc @lhoestq @polinaeterna here |
@patrickvonplaten @lhoestq @polinaeterna I was unable to load audio from Common Voice using 🤗 with the current version of torchaudio, but downgrading to torchaudio 0.11.0 fixed it. This is probably more of a torch problem than a Hugging Face problem. |
@DCNemesis that's interesting, could you please share the error message if you still can access it? |
@polinaeterna I believe it is the same exact error as above. It occurs on other .mp3 sources as well, but the problem is with torchaudio > 0.11.0. I've created a short colab notebook that reproduces the error, and the fix here: https://colab.research.google.com/drive/18wsuwdHwBPN3JkcnhEtk8MUYqF9swuWZ?usp=sharing |
Hi @DCNemesis, Your issue was slightly different from the original one in this issue page. Yours seems related to a change in the backend used by Normally, it should be circumvented with the patch made by @polinaeterna in: |
I think the original issue reported here was already fixed by: Otherwise, feel free to reopen. |
Describe the bug
Steps to reproduce the bug
Expected results
The path should be the complete absolute path to the downloaded audio file not some relative path.
Actual results
Environment info
datasets
version: 1.18.3.dev0The text was updated successfully, but these errors were encountered: