whisper.api / README.md
Ved Gupta
Document asynchronous processing, concurrency control, `MAX_CONCURRENT_TRANSCRIPTIONS` option, and update `uvicorn` command in README.
470f81c
---
title: whisper.api
emoji: ๐Ÿ˜ถโ€๐ŸŒซ๏ธ
colorFrom: purple
colorTo: gray
sdk: docker
app_file: Dockerfile
app_port: 7860
---
## Whisper API - Speech to Text Transcription
This open source project provides a self-hostable API for speech to text transcription using a finetuned Whisper ASR model. The API allows you to easily convert audio files to text through HTTP requests. Ideal for adding speech recognition capabilities to your applications.
Key features:
- Uses a finetuned Whisper model for accurate speech recognition
- Simple HTTP API for audio file transcription
- User level access with API keys for managing usage
- Self-hostable code for your own speech transcription service
- Quantized model optimization for fast and efficient inference
- **Asynchronous Processing**: Non-blocking transcription for high availability
- **Concurrency Control**: Built-in request queuing to prevent server overload
- Open source implementation for customization and transparency
This repository contains code to deploy the API server along with finetuning and quantizing models. Check out the documentation for getting started!
## Installation
To install the necessary dependencies and setup the Whisper binary, follow these steps:
### 1. System Dependencies
Install `ffmpeg` for audio processing and build tools (`make`, `cmake`, `g++`) for compiling Whisper.
```bash
# Ubuntu/Debian
sudo apt install ffmpeg git make cmake g++
# macOS
brew install ffmpeg cmake
```
### 2. Python Dependencies
Install the required Python packages.
```bash
pip install -r requirements.txt
```
### 3. Setup Environment
Copy the example environment file and configure it:
```bash
cp .env.example .env
# Edit .env with your database credentials and settings
# Optional: Set MAX_CONCURRENT_TRANSCRIPTIONS (default: 2) in .env to control parallel jobs
```
### 4. Setup Whisper
Run the setup script to clone, build, and configure the Whisper binary.
```bash
chmod +x setup_whisper.sh
./setup_whisper.sh
```
## Running the Project
### Run Locally (without Docker)
To run the project locally (e.g., inside a Conda environment or virtualenv):
```bash
# Ensure your environment is active (e.g., conda activate whisper-api)
uvicorn app.main:app --host 0.0.0.0 --port 7860 --reload
```
### Docker (Production)
To run the project using Docker:
```bash
# Build the image
docker build -t whisper-api .
# Run the container (ensure env vars are passed or secrets used)
# For local testing with .env file:
docker run --env-file .env -p 7860:7860 whisper-api
```
## Get Your token
To get your token, use the following command:
```bash
curl -X 'POST' \
'http://localhost:8000/api/v1/users/get_token' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"email": "example@domain.com",
"password": "password"
}'
```
## Example to Transcribe a File
To upload a file and transcribe it, use the following command:
Note: The token is a dummy token and will not work. Please use the token provided by the admin.
Here are the available models:
- tiny.en
- tiny.en.q5
- base.en.q5
```bash
# Modify the token and audioFilePath
curl -X 'POST' \
'http://localhost:8000/api/v1/transcribe/?model=tiny.en.q5' \
-H 'accept: application/json' \
-H 'Authentication: e9b7658aa93342c492fa64153849c68b8md9uBmaqCwKq4VcgkuBD0G54FmsE8JT' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@audioFilePath.wav;type=audio/wav'
```
## License
[MIT](https://choosealicense.com/licenses/mit/)
## Reference & Credits
- [https://github.com/openai/whisper](https://github.com/openai/whisper)
- [https://openai.com/blog/whisper/](https://openai.com/blog/whisper/)
- [https://github.com/ggerganov/whisper.cpp](https://github.com/ggerganov/whisper.cpp)
## Authors
- [Ved Gupta](https://www.github.com/innovatorved)
## ๐Ÿš€ About Me
Just try to be a developer!
## Support
For support, email vedgupta@protonmail.com