Skip to content

A simple neural forced aligner for phoneme to audio alignment, requires only numpy for inference

License

Notifications You must be signed in to change notification settings

Patchethium/snfa

Repository files navigation

snfa

snfa (Simple Neural Forced Aligner) is a phoneme-to-audio forced aligner built for embedded usage in python programs, with its only inference dependency being numpy and python 3.7 or later.

  • Tiny model size (2 MB)
  • Numpy as the only dependency
  • MFA comparable alignment quality

Note: You still need PyTorch and some other libs if you want to do training.

Inference

pip install snfa

Download the pretrained cv_jp.bin weights from release.

cv_jp.bin is a weight file trained on Japanese Common Voice Corpus 14.0, 6/28/2023. The model weight is released into Public Domain.

import snfa

aligner = snfa.Aligner("cv_jp.bin")
transcript = "k o N n i ch i w a".split(" ")

# you can also use `scipy` or `wavfile` as long as you normalize it to [-1,1]
x, _ = librosa.load("sample.wav", sr=aligner.sr)

segments, path, trellis, emission, labels = aligner(x, transcript)

print(segment)

Training

I'll cover this part if it's needed by anyone. Please let me know by creating an issue if you need.

Todos

  • Rust crate
  • multi-language
  • Storing pau index in binary model
  • Record and warn the user when score is too low

Licence

snfa is released under ISC Licence, as shown here.

The file snfa/stft.py contains code adapted from librosa which obeys ISC Licence with different copyright claim. A copy of librosa's licence can be found in librosa's repo.

The file snfa/viterbi.py contains code adapted from torchaudio which obeys BSD 2-Clause "Simplified" License. A copy of torchaudio's licence can be found in torchaudio's repo.

Credit

The neural network used in snfa is basically a PyTorch implementation of CTC* structure described in Evaluating Speech—Phoneme Alignment and Its Impact on Neural Text-To-Speech Synthesis.

About

A simple neural forced aligner for phoneme to audio alignment, requires only numpy for inference

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages