Skip to content

Latest commit

 

History

History
134 lines (97 loc) · 6.2 KB

README.md

File metadata and controls

134 lines (97 loc) · 6.2 KB

Whisper-WebUI

A Gradio-based browser interface for Whisper. You can use it as an Easy Subtitle Generator!

screen

Notebook

If you wish to try this on Colab, you can do it in here!

Feature

Pipeline Diagram

Transcription Pipeline

Installation and Running

  • Running with Pinokio

The app is able to run with Pinokio.

  1. Install Pinokio Software.
  2. Open the software and search for Whisper-WebUI and install it.
  3. Start the Whisper-WebUI and connect to the http://localhost:7860.
  • Running with Docker

  1. Install and launch Docker-Desktop.

  2. Git clone the repository

git clone https://github.com/jhj0517/Whisper-WebUI.git
  1. Build the image ( Image is about 7GB~ )
docker compose build 
  1. Run the container
docker compose up
  1. Connect to the WebUI with your browser at http://localhost:7860

If needed, update the docker-compose.yaml to match your environment.

  • Run Locally

Prerequisite

To run this WebUI, you need to have git, 3.10 <= python <= 3.12, FFmpeg.
And if you're not using an Nvida GPU, or using a different CUDA version than 12.4, edit the requirements.txt to match your environment.

Please follow the links below to install the necessary software:

After installing FFmpeg, make sure to add the FFmpeg/bin folder to your system PATH!

Installation Using the Script Files

  1. git clone this repository
git clone https://github.com/jhj0517/Whisper-WebUI.git
  1. Run install.bat or install.sh to install dependencies. (It will create a venv directory and install dependencies there.)
  2. Start WebUI with start-webui.bat or start-webui.sh (It will run python app.py after activating the venv)

And you can also run the project with command line arguments if you like to, see wiki for a guide to arguments.

VRAM Usages

This project is integrated with faster-whisper by default for better VRAM usage and transcription speed.

According to faster-whisper, the efficiency of the optimized whisper model is as follows:

Implementation Precision Beam size Time Max. GPU memory Max. CPU memory
openai/whisper fp16 5 4m30s 11325MB 9439MB
faster-whisper fp16 5 54s 4755MB 3244MB

If you want to use an implementation other than faster-whisper, use --whisper_type arg and the repository name.
Read wiki for more info about CLI args.

If you want to use a fine-tuned model, manually place the models in models/Whisper/ corresponding to the implementation.

Alternatively, if you enter the huggingface repo id (e.g, deepdml/faster-whisper-large-v3-turbo-ct2) in the "Model" dropdown, it will be automatically downloaded in the directory.

image

REST API

If you're interested in deploying this app as a REST API, please check out /backend.

TODO🗓

  • Add DeepL API translation
  • Add NLLB Model translation
  • Integrate with faster-whisper
  • Integrate with insanely-fast-whisper
  • Integrate with whisperX ( Only speaker diarization part )
  • Add background music separation pre-processing with UVR
  • Add fast api script
  • Add CLI usages
  • Support real-time transcription for microphone

Translation 🌐

Any PRs that translate the language into translation.yaml would be greatly appreciated!