You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
None of the tar commands succeed. This happens on line 433-434 and on line 447, and I believe they fail for different reasons.
The first time (line 433-434), the path to the tar file is incorrect. I believe the script is intended to be run from the algorithmic_efficiency root folder (instructions say call python3 datasets/dataset_setup.py …), and I used different tmp and data directories
The second time (line 447) it fails because wget (line 445) hasn’t completed yet
To fix this, I suggest:
passing cwd=tmp_librispeech_dir to the Popen constructor on line 434
appending .communicate() on line 445
change line 447 to subprocess.Popen(f'tar xzvf {tar_filename}', shell=True, cwd=tmp_librispeech_dir).communicate()
(I tested those changes locally and they solved those problems)
After untarring the files, everything is in tmp_librispeech_dir/LibriSpeech so line 450 should be changed to take data_dir=os.path.join(tmp_librispeech_dir, ‘LibriSpeech’)
There are also path-related problems in the librispeech_tokenizer.py. Here the file spm_model.vocab gets copied to the algorithmic_efficiency directory (from which python3 was a called). Then it isn’t found when librispeech_tokenizer.load_tokenizer() gets called from librispeech_preprocess.run()
The text was updated successfully, but these errors were encountered:
From Mike's test:
The text was updated successfully, but these errors were encountered: