Automatic Speech Recognition
Safetensors
Chinese
whisper
StutteredSpeechASR / README.md
shaomei's picture
Update README.md
66282c2 verified
|
raw
history blame
3.96 kB
---
datasets:
- AImpower/MandarinStutteredSpeech
language:
- zh
metrics:
- wer
base_model:
- openai/whisper-large-v2
tags:
- stuttering
- verbatim
- disfluency
---
# πŸ—£οΈ StutteredSpeechASR Research Demo
A Gradio-based research demonstration showcasing **StutteredSpeechASR**, a Whisper model fine-tuned specifically for stuttered speech recognition (Mandarin). Compare its performance against baseline Whisper models to see the improvement on stuttered speech patterns.
![Python](https://img.shields.io/badge/Python-3.9+-blue.svg)
![Gradio](https://img.shields.io/badge/Gradio-4.0+-orange.svg)
![Research](https://img.shields.io/badge/Research-Demo-green.svg)
## 🎯 Features
- **StutteredSpeechASR Research**: Showcases fine-tuned Whisper model specifically designed for stuttered speech
- **Comparative Analysis**: Side-by-side comparison with baseline Whisper models
- **Audio Input Flexibility**: Record via microphone or upload audio files
- **Specialized for Stuttered Speech**: Better handling of repetitions, prolongations, and blocks
- **Clean Interface**: Organized model cards with clear transcription results
- **Lightweight Deployment**: All inference via Hugging Face APIs - no GPU required
## πŸ€– Models Included
| Model | Type | Description |
|-------|------|-------------|
| πŸ—£οΈ **StutteredSpeechASR** | Fine-tuned Research Model | Whisper fine-tuned specifically for stuttered speech (Mandarin) |
| πŸŽ™οΈ **Whisper Large V3** | Baseline Model | OpenAI's Whisper Large V3 model via HF Inference API |
| πŸ”Š **Whisper Large V3 Turbo** | Baseline Model | OpenAI's Whisper Large V3 Turbo (faster) via HF Inference API |
## πŸ“‹ Requirements
- Python 3.9+
- Hugging Face API key
- Docker (optional, for containerized deployment)
## πŸ”‘ Environment Setup
Create a `.env` file in the project root with your Hugging Face credentials:
```env
HF_ENDPOINT=https://your-endpoint-url.aws.endpoints.huggingface.cloud
HF_API_KEY=hf_your_api_key_here
```
| Variable | Description |
|----------|-------------|
| `HF_ENDPOINT` | Your dedicated Hugging Face Inference Endpoint URL for StutteredSpeechASR |
| `HF_API_KEY` | Your Hugging Face API token (get one at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)) |
## πŸš€ Quick Start
### Option 1: Run with Docker (Recommended)
1. **Create your `.env` file** with HuggingFace credentials (see above)
2. **Build and run with Docker Compose**
```bash
docker compose up --build
```
3. **Open your browser** and navigate to `http://localhost:7860`
### Option 2: Run Locally
1. **Clone the repository**
```bash
git clone <your-repo-url>
cd asr_demo
```
2. **Create a virtual environment** (recommended)
```bash
python -m venv venv
# Windows
venv\Scripts\activate
# Linux/macOS
source venv/bin/activate
```
3. **Install dependencies**
```bash
pip install -r requirements.txt
```
4. **Create your `.env` file** with HuggingFace credentials (see Environment Setup above)
5. **Run the application**
```bash
python app.py
```
6. **Open your browser** and navigate to `http://localhost:7860`
## πŸ§ͺ Research Notes
- **Target Language**: The StutteredSpeechASR model is specifically trained for Mandarin Chinese
- **Use Cases**: Research demonstration, stuttered speech analysis, comparative ASR evaluation
- **Best Results**: Use clear audio recordings for optimal model performance
- **Baseline Comparison**: The Whisper models may struggle with stuttered speech patterns that StutteredSpeechASR handles well
## πŸ“š References
- [Gradio Documentation](https://www.gradio.app/docs)
- [Hugging Face Inference API](https://huggingface.co/docs/api-inference)
- [Hugging Face Inference Endpoints](https://huggingface.co/docs/inference-endpoints)
- [AImpower StutteredSpeechASR](https://huggingface.co/AImpower/StutteredSpeechASR)
- [OpenAI Whisper](https://github.com/openai/whisper)