Spaces:

LAP-DEV
/

Demo

Running

App Files Files Community

Demo / README.md

LAP-DEV

Update README.md

0949477 verified 10 months ago

preview code

raw

history blame

4.5 kB

	---
	sdk: gradio
	sdk_version: 5.34.0
	---
	# Whisper-WebUI
	A Gradio-based browser interface for [Whisper](https://github.com/openai/whisper)

	# Features
	- Select the Whisper implementation you want to use between:
	- [openai/whisper](https://github.com/openai/whisper)
	- [SYSTRAN/faster-whisper](https://github.com/SYSTRAN/faster-whisper) (used by default)
	- [Vaibhavs10/insanely-fast-whisper](https://github.com/Vaibhavs10/insanely-fast-whisper)
	- Generate transcriptions from various sources, including files & microphone
	- Currently supported output formats: csv, srt & txt
	- Speech to Text Translation:
	- From other languages to English (This is Whisper's end-to-end speech-to-text translation feature)
	- Translate transcription files using Facebook NLLB models
	- Pre-processing audio input with [Silero VAD](https://github.com/snakers4/silero-vad)
	- Post-processing with speaker diarization using the [pyannote](https://huggingface.co/pyannote/speaker-diarization-3.1) model:
	- To download the pyannote model, you need to have a Huggingface token and manually accept their terms in the pages below:
	1. https://huggingface.co/pyannote/speaker-diarization-3.1
	2. https://huggingface.co/pyannote/segmentation-3.0

	# Installation and Running

	- ## Run Locally

	### Prerequisite
	To run this WebUI, you need to have `git`, `python` version 3.8 ~ 3.10 & `FFmpeg`.<BR>If you're not using an Nvida GPU, or using a different `CUDA` version than 12.4, edit the file `requirements.txt` to match your environment.

	Please follow the links below to install the necessary software:
	- git : [https://git-scm.com/downloads](https://git-scm.com/downloads)
	- python : [https://www.python.org/downloads/](https://www.python.org/downloads/)
	- FFmpeg : [https://ffmpeg.org/download.html](https://ffmpeg.org/download.html)
	- CUDA : [https://developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads)

	After installing `FFmpeg`, make sure to add the `FFmpeg/bin` folder to your system `PATH`

	### Installation using the script files

	1. Download the the repository and extract its contents
	2. Run `install.bat` or `install.sh` to install dependencies (It will create a `venv` directory and install dependencies there)
	3. Start WebUI with `start-webui.bat` or `start-webui.sh` (It will run `python app.py` after activating the venv)

	- ## Running with Docker

	1. Install and launch [Docker-Desktop](https://www.docker.com/products/docker-desktop/)

	2. Get the repository

	3. If needed, update the `docker-compose.yaml` to match your environment

	4. Docker commands:

	Build the image ( Image is about ~7GB)
	```sh
	docker compose build
	```

	Run the container
	```sh
	docker compose up
	```

	5. Connect to the WebUI with your browser at `http://localhost:7860`

	# VRAM Usages
	- This project is integrated with [faster-whisper](https://github.com/guillaumekln/faster-whisper) by default for better VRAM usage and transcription speed.<BR>According to faster-whisper, the efficiency of the optimized whisper model is as follows:
	\| Implementation \| Precision \| Beam size \| Time \| Max. GPU memory \| Max. CPU memory \|
	\|-------------------\|-----------\|-----------\|-------\|-----------------\|-----------------\|
	\| openai/whisper \| fp16 \| 5 \| 4m30s \| 11325MB \| 9439MB \|
	\| faster-whisper \| fp16 \| 5 \| 54s \| 4755MB \| 3244MB \|

	- Whisper's original VRAM usage table for available models:
	\| Size \| Parameters \| English-only model \| Multilingual model \| Required VRAM \| Relative speed \|
	\|:------:\|:----------:\|:------------------:\|:------------------:\|:-------------:\|:--------------:\|
	\| tiny \| 39 M \| `tiny.en` \| `tiny` \| ~1 GB \| ~32x \|
	\| base \| 74 M \| `base.en` \| `base` \| ~1 GB \| ~16x \|
	\| small \| 244 M \| `small.en` \| `small` \| ~2 GB \| ~6x \|
	\| medium \| 769 M \| `medium.en` \| `medium` \| ~5 GB \| ~2x \|
	\| large \| 1550 M \| N/A \| `large` \| ~10 GB \| 1x \|

	Note: `.en` models are for English only, and you can use the `Translate to English` option from the other models