TorchFCPE

Overview

TorchFCPE(Fast Context-based Pitch Estimation) is a PyTorch-based library designed for audio pitch extraction and MIDI conversion. This README provides a quick guide on how to use the library for audio pitch inference and MIDI extraction.

Note: that the MIDI extractor of FCPE is quantized from f0 using non neural network methods

Note: I won't be updating FCPE (or benchmark) so soon, but I will definitely release a version with cleaned-up code by no later than next year.

Installation

Before using the library, make sure you have the necessary dependencies installed:

pip install torchfcpe

Usage

1. Audio Pitch Inference

from torchfcpe import spawn_bundled_infer_model
import torch
import librosa

# Configure device and target hop size
device = 'cpu'  # or 'cuda' if using a GPU
sr = 16000  # Sample rate
hop_size = 160  # Hop size for processing

# Load and preprocess audio
audio, sr = librosa.load('test.wav', sr=sr)
audio = librosa.to_mono(audio)
audio_length = len(audio)
f0_target_length = (audio_length // hop_size) + 1
audio = torch.from_numpy(audio).float().unsqueeze(0).unsqueeze(-1).to(device)

# Load the model
model = spawn_bundled_infer_model(device=device)

# Perform pitch inference
f0 = model.infer(
    audio,
    sr=sr,
    decoder_mode='local_argmax',  # Recommended mode
    threshold=0.006,  # Threshold for V/UV decision
    f0_min=80,  # Minimum pitch
    f0_max=880,  # Maximum pitch
    interp_uv=False,  # Interpolate unvoiced frames
    output_interp_target_length=f0_target_length,  # Interpolate to target length
)

print(f0)

2. MIDI Extraction

# Extract MIDI from audio
midi = model.extact_midi(
    audio,
    sr=sr,
    decoder_mode='local_argmax',  # Recommended mode
    threshold=0.006,  # Threshold for V/UV decision
    f0_min=80,  # Minimum pitch
    f0_max=880,  # Maximum pitch
    output_path="test.mid",  # Save MIDI to file
)

print(midi)

Notes

Inference Parameters:
- audio: Input audio as a torch.Tensor.
- sr: Sample rate of the audio.
- decoder_mode (Optional): Mode for decoding, 'local_argmax' is recommended.
- threshold (Optional): Threshold for voice/unvoiced decision; default is 0.006.
- f0_min (Optional): Minimum pitch value; default is 80 Hz.
- f0_max (Optional): Maximum pitch value; default is 880 Hz.
- interp_uv (Optional): Whether to interpolate unvoiced frames; default is False.
- output_interp_target_length (Optional): Length to which the output pitch should be interpolated.
MIDI Extraction Parameters:
- audio: Input audio as a torch.Tensor.
- sr: Sample rate of the audio.
- decoder_mode (Optional): Mode for decoding; 'local_argmax' is recommended.
- threshold (Optional): Threshold for voice/unvoiced decision; default is 0.006.
- f0_min (Optional): Minimum pitch value; default is 80 Hz.
- f0_max (Optional): Maximum pitch value; default is 880 Hz.
- output_path (Optional): File path to save the MIDI file. If not provided, only returns the MIDI structure.
- tempo (Optional): BPM for the MIDI file. If None, BPM is automatically predicted.

Additional Features

Model as a PyTorch Module: You can use the model as a standard PyTorch module. For example:

# Change device
model = model.to(device)

# Compile model
model = torch.compile(model)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

TorchFCPE

Overview

Installation

Usage

1. Audio Pitch Inference

2. MIDI Extraction

Notes

Additional Features

Files

README.md

Latest commit

History

README.md

File metadata and controls

TorchFCPE

Overview

Installation

Usage

1. Audio Pitch Inference

2. MIDI Extraction

Notes

Additional Features