Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload pretrained model to run inference #17

Open
greenbech opened this issue Apr 9, 2020 · 3 comments
Open

Upload pretrained model to run inference #17

greenbech opened this issue Apr 9, 2020 · 3 comments

Comments

@greenbech
Copy link

greenbech commented Apr 9, 2020

It would be great if anyone could upload a pretrained model so that we could try this model/project without needing to train the model. It is quite a big commitment to wait a week for training (as mentioned in #10 ) if you primarily just want to check out the performance on some .wav files.

And I would also like to say this repo is very well written and educational. Thanks!

@jongwook
Copy link
Owner

jongwook commented Apr 9, 2020

Hi, please try this one, trained for 500,000 iterations on the MAESTRO dataset.

I haven't touched the model in a while, but torch.load('model-500000.pt’) should be able to load the PyTorch model.

@greenbech
Copy link
Author

greenbech commented Apr 10, 2020

The provided file works great, thanks a lot! I didn't need to use torch.load('model-500000.pt’) since both evaluate.py and transcribe.py has the model file as an argument.

However, after I first got this error message when trying the run the scripts:

Traceback (most recent call last):
  File "transcribe.py", line 101, in <module>
    transcribe_file(**vars(parser.parse_args()))
  File "transcribe.py", line 74, in transcribe_file
    predictions = transcribe(model, audio)
  File "transcribe.py", line 53, in transcribe
    onset_pred, offset_pred, _, frame_pred, velocity_pred = model(mel)
  File "/Users/greenbech/.pyenv/versions/3.7.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/greenbech/git/onsets-and-frames/onsets_and_frames/transcriber.py", line 87, in forward
    onset_pred = self.onset_stack(mel)
  File "/Users/greenbech/.pyenv/versions/3.7.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/greenbech/.pyenv/versions/3.7.5/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/Users/greenbech/.pyenv/versions/3.7.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/greenbech/git/onsets-and-frames/onsets_and_frames/lstm.py", line 29, in forward
    output[:, start:end, :], (h, c) = self.rnn(x[:, start:end, :], (h, c))
  File "/Users/greenbech/.pyenv/versions/3.7.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/greenbech/.pyenv/versions/3.7.5/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 558, in forward
    result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
  File "/Users/greenbech/.pyenv/versions/3.7.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 576, in __getattr__
    type(self).__name__, name))
AttributeError: 'LSTM' object has no attribute '_flat_weights'

Downgrading from 1.4.0 to torch==1.2.0 fixed it for me.

It is also quite cumbersome to resample to the audio file to 16kHz before hand, so I added this locally to transcribe.py:

def float_samples_to_int16(y):
  """Convert floating-point numpy array of audio samples to int16."""
  # From https://github.com/tensorflow/magenta/blob/671501934ff6783a7912cc3e0e628fd0ea2dc609/magenta/music/audio_io.py#L48
  if not issubclass(y.dtype.type, np.floating):
    raise ValueError('input samples not floating-point')
  return (y * np.iinfo(np.int16).max).astype(np.int16)


def load_and_process_audio(flac_path, sequence_length, device):

    random = np.random.RandomState(seed=42)

    audio, sr = librosa.load(flac_path, sr=SAMPLE_RATE)
    audio = float_samples_to_int16(audio)
    
    assert sr == SAMPLE_RATE
    assert audio.dtype == 'int16'
    ...

There might be elegant ways of doing this, but I was not able to convert to uint16 with librosa or resample with soundfile.read.

I also think the the model you provided should be available in the README for others to try out without going to this issue. I was thinking either directly in ./data/pretrained since this it the easiest setup but increases the repo size unnecessarily or with the drive url you provided.

Would you mind a PR with this?

@jongwook
Copy link
Owner

Yeah! I'll need some housekeeping to make the checkpoint work cross-version. PR is welcome! Thanks :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants