Skip to content

fast state-of-the-art speech models and a runtime that runs anywhere πŸ’₯

License

Notifications You must be signed in to change notification settings

wavify-labs/wavify-sdks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

NixOS logo

Documentation Static Badge GitHub Actions Workflow Status

Wavify is a collection of small speech models and a runtime that is blazingly fast and runs anywhere.

Features

Tasks

  • Speech-to-text
  • Wake-word detection
  • Speech-to-intent

Bindings

  • Python
  • Kotlin
  • Swift
  • Rust
  • Flutter
  • C++

Additional foreign language bindings can be developed externally and we welcome contributions to list them here. Function signature are available in lib/wavify_core.h.

Platforms

  • aarch64-apple-ios
  • aarch64-linux-android
  • aarch64-unknown-linux-gnu
  • aarch64-apple-darwin
  • x86_64-pc-windows-gnu
  • x86_64-unknown-linux-gnu

Benchmarks

Running speech-to-text on assets/samples_jfk.wav on a Raspberry Pi 5.

Engine Size Time Real-time factor
Whisper.cpp
75MB
(Whisper tiny)
4.91s 0.45
Wavify 45MB 2.21s 0.20

Demo

demo.MP4

Usage

Speech-to-text models for supported languages are available here. The filename specifies the language in which the model operates, indicated by the ISO 639-1 code.

We provide the example wake-word model model-wakeword-alexa.bin. For custom models please contact us at [email protected].

You'll also need an API key which is available for free. You can get it from your dashboard once signed in.

Python

pip install wavify
import os
from wavify.stt import SttEngine

engine = SttEngine("path/to/your/model", os.getenv("WAVIFY_API_KEY"))
result = engine.stt_from_file("/path/to/your/file")
import os
from wavify.wakeword import WakeWordEngine

engine = WakeWordEngine("path/to/your/model", os.getenv("WAVIFY_API_KEY"))
audio = ... # audio needs to be 2 seconds sampled at 16kHz, 16 bit linearly encoded and single channel
result = engine.detect(audio)

Rust

cargo add wavify
use std::env;
use anyhow::Result;
use wavify::SttEngine;

fn main() -> Result<()> {
  let engine = SttEngine::new("/path/to/your/model", &env::var("WAVIFY_API_KEY")?)?;
  let result = engine.stt_from_file("/path/to/your/file")?;
  Ok(())
}

Android

Kotlin bindings and an example app showcasing the integration of Wavify is available in android/.

import dev.wavify.SttEngine

val modelPath = File(applicationContext.filesDir, "/your/model").absolutePath
val apiKey = BuildConfig.WAVIFY_AP_KEY
val appName = applicationContext.packageName
val engine = SttEngine.create(modelPath, apiKey, appName) 

val audioFloats = floatArrayOf(0.0f, 0.1f, 0.2f) // Replace with actual audio data
val result = engine.stt(audioFloats)

iOS

Swift bindings and an example app showcasing the integration of Wavify is available in ios/.

guard let modelPath = Bundle.main.path(forResource: "/your/model", ofType: "bin") else {
  fatalError("Failed to find model file.")
}
guard let apiKey = Bundle.main.object(forInfoDictionaryKey: "WAVIFY_API_KEY") as? String else {
  fatalError("No api key found.")
}
engine = SttEngine(modelPath: modelPath, apiKey: apiKey)!
let audioFloats: [Float] = [3.14, 2.71, 1.61]
engine.recognizeSpeech(from: convertDataToFloatArray(data: floatValues.withUnsafeBufferPointer { Data(buffer: $0) })

Contributing

Contributions to wavify are welcome.

  • Please report bugs as GitHub issues.
  • Questions via GitHub issues are welcome!
  • To build from source, check the contributing page.

Contact

For specialized solutions, including the development of custom models optimized for your specific use case, or to discuss how Wavify can be adapted to meet your requirements, you can contact our team directly at [email protected].

About

fast state-of-the-art speech models and a runtime that runs anywhere πŸ’₯

Resources

License

Stars

Watchers

Forks

Packages

No packages published