File size: 3,958 Bytes
66282c2 df83b8b fbcb7d1 df83b8b fbcb7d1 df83b8b fbcb7d1 df83b8b fbcb7d1 df83b8b fbcb7d1 df83b8b fbcb7d1 df83b8b fbcb7d1 df83b8b fbcb7d1 df83b8b fbcb7d1 df83b8b fbcb7d1 df83b8b fbcb7d1 df83b8b fbcb7d1 df83b8b fbcb7d1 df83b8b fbcb7d1 df83b8b fbcb7d1 df83b8b bc46fb1 df83b8b bc46fb1 df83b8b bc46fb1 df83b8b bc46fb1 fbcb7d1 df83b8b fbcb7d1 df83b8b fbcb7d1 df83b8b fbcb7d1 66282c2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | ---
datasets:
- AImpower/MandarinStutteredSpeech
language:
- zh
metrics:
- wer
base_model:
- openai/whisper-large-v2
tags:
- stuttering
- verbatim
- disfluency
---
# π£οΈ StutteredSpeechASR Research Demo
A Gradio-based research demonstration showcasing **StutteredSpeechASR**, a Whisper model fine-tuned specifically for stuttered speech recognition (Mandarin). Compare its performance against baseline Whisper models to see the improvement on stuttered speech patterns.



## π― Features
- **StutteredSpeechASR Research**: Showcases fine-tuned Whisper model specifically designed for stuttered speech
- **Comparative Analysis**: Side-by-side comparison with baseline Whisper models
- **Audio Input Flexibility**: Record via microphone or upload audio files
- **Specialized for Stuttered Speech**: Better handling of repetitions, prolongations, and blocks
- **Clean Interface**: Organized model cards with clear transcription results
- **Lightweight Deployment**: All inference via Hugging Face APIs - no GPU required
## π€ Models Included
| Model | Type | Description |
|-------|------|-------------|
| π£οΈ **StutteredSpeechASR** | Fine-tuned Research Model | Whisper fine-tuned specifically for stuttered speech (Mandarin) |
| ποΈ **Whisper Large V3** | Baseline Model | OpenAI's Whisper Large V3 model via HF Inference API |
| π **Whisper Large V3 Turbo** | Baseline Model | OpenAI's Whisper Large V3 Turbo (faster) via HF Inference API |
## π Requirements
- Python 3.9+
- Hugging Face API key
- Docker (optional, for containerized deployment)
## π Environment Setup
Create a `.env` file in the project root with your Hugging Face credentials:
```env
HF_ENDPOINT=https://your-endpoint-url.aws.endpoints.huggingface.cloud
HF_API_KEY=hf_your_api_key_here
```
| Variable | Description |
|----------|-------------|
| `HF_ENDPOINT` | Your dedicated Hugging Face Inference Endpoint URL for StutteredSpeechASR |
| `HF_API_KEY` | Your Hugging Face API token (get one at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)) |
## π Quick Start
### Option 1: Run with Docker (Recommended)
1. **Create your `.env` file** with HuggingFace credentials (see above)
2. **Build and run with Docker Compose**
```bash
docker compose up --build
```
3. **Open your browser** and navigate to `http://localhost:7860`
### Option 2: Run Locally
1. **Clone the repository**
```bash
git clone <your-repo-url>
cd asr_demo
```
2. **Create a virtual environment** (recommended)
```bash
python -m venv venv
# Windows
venv\Scripts\activate
# Linux/macOS
source venv/bin/activate
```
3. **Install dependencies**
```bash
pip install -r requirements.txt
```
4. **Create your `.env` file** with HuggingFace credentials (see Environment Setup above)
5. **Run the application**
```bash
python app.py
```
6. **Open your browser** and navigate to `http://localhost:7860`
## π§ͺ Research Notes
- **Target Language**: The StutteredSpeechASR model is specifically trained for Mandarin Chinese
- **Use Cases**: Research demonstration, stuttered speech analysis, comparative ASR evaluation
- **Best Results**: Use clear audio recordings for optimal model performance
- **Baseline Comparison**: The Whisper models may struggle with stuttered speech patterns that StutteredSpeechASR handles well
## π References
- [Gradio Documentation](https://www.gradio.app/docs)
- [Hugging Face Inference API](https://huggingface.co/docs/api-inference)
- [Hugging Face Inference Endpoints](https://huggingface.co/docs/inference-endpoints)
- [AImpower StutteredSpeechASR](https://huggingface.co/AImpower/StutteredSpeechASR)
- [OpenAI Whisper](https://github.com/openai/whisper) |