---
datasets:
- AImpower/MandarinStutteredSpeech
language:
- zh
metrics:
- wer
base_model:
- openai/whisper-large-v2
tags:
- stuttering
- verbatim
- disfluency
---
# 🗣️ StutteredSpeechASR Research Demo

A Gradio-based research demonstration showcasing **StutteredSpeechASR**, a Whisper model fine-tuned specifically for stuttered speech recognition (Mandarin). Compare its performance against baseline Whisper models to see the improvement on stuttered speech patterns.

![Python](https://img.shields.io/badge/Python-3.9+-blue.svg)
![Gradio](https://img.shields.io/badge/Gradio-4.0+-orange.svg)
![Research](https://img.shields.io/badge/Research-Demo-green.svg)

## 🎯 Features

- **StutteredSpeechASR Research**: Showcases fine-tuned Whisper model specifically designed for stuttered speech
- **Comparative Analysis**: Side-by-side comparison with baseline Whisper models
- **Audio Input Flexibility**: Record via microphone or upload audio files
- **Specialized for Stuttered Speech**: Better handling of repetitions, prolongations, and blocks
- **Clean Interface**: Organized model cards with clear transcription results
- **Lightweight Deployment**: All inference via Hugging Face APIs - no GPU required

## 🤖 Models Included

| Model | Type | Description |
|-------|------|-------------|
| 🗣️ **StutteredSpeechASR** | Fine-tuned Research Model | Whisper fine-tuned specifically for stuttered speech (Mandarin) |
| 🎙️ **Whisper Large V3** | Baseline Model | OpenAI's Whisper Large V3 model via HF Inference API |
| 🔊 **Whisper Large V3 Turbo** | Baseline Model | OpenAI's Whisper Large V3 Turbo (faster) via HF Inference API |


## 📋 Requirements

- Python 3.9+
- Hugging Face API key
- Docker (optional, for containerized deployment)

## 🔑 Environment Setup

Create a `.env` file in the project root with your Hugging Face credentials:

```env
HF_ENDPOINT=https://your-endpoint-url.aws.endpoints.huggingface.cloud
HF_API_KEY=hf_your_api_key_here
```

| Variable | Description |
|----------|-------------|
| `HF_ENDPOINT` | Your dedicated Hugging Face Inference Endpoint URL for StutteredSpeechASR |
| `HF_API_KEY` | Your Hugging Face API token (get one at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)) |

## 🚀 Quick Start

### Option 1: Run with Docker (Recommended)

1. **Create your `.env` file** with HuggingFace credentials (see above)

2. **Build and run with Docker Compose**
   ```bash
   docker compose up --build
   ```

3. **Open your browser** and navigate to `http://localhost:7860`

### Option 2: Run Locally

1. **Clone the repository**
   ```bash
   git clone <your-repo-url>
   cd asr_demo
   ```

2. **Create a virtual environment** (recommended)
   ```bash
   python -m venv venv
   
   # Windows
   venv\Scripts\activate
   
   # Linux/macOS
   source venv/bin/activate
   ```

3. **Install dependencies**
   ```bash
   pip install -r requirements.txt
   ```

4. **Create your `.env` file** with HuggingFace credentials (see Environment Setup above)

5. **Run the application**
   ```bash
   python app.py
   ```

6. **Open your browser** and navigate to `http://localhost:7860`


## 🧪 Research Notes

- **Target Language**: The StutteredSpeechASR model is specifically trained for Mandarin Chinese
- **Use Cases**: Research demonstration, stuttered speech analysis, comparative ASR evaluation
- **Best Results**: Use clear audio recordings for optimal model performance
- **Baseline Comparison**: The Whisper models may struggle with stuttered speech patterns that StutteredSpeechASR handles well


## 📚 References

- [Gradio Documentation](https://www.gradio.app/docs)
- [Hugging Face Inference API](https://huggingface.co/docs/api-inference)
- [Hugging Face Inference Endpoints](https://huggingface.co/docs/inference-endpoints)
- [AImpower StutteredSpeechASR](https://huggingface.co/AImpower/StutteredSpeechASR)
- [OpenAI Whisper](https://github.com/openai/whisper)