Automatic Speech Recognition
Safetensors
Chinese
whisper
File size: 3,958 Bytes
66282c2
 
 
 
 
 
 
 
 
 
 
 
 
 
df83b8b
fbcb7d1
df83b8b
fbcb7d1
 
 
df83b8b
fbcb7d1
 
 
df83b8b
 
 
 
 
 
fbcb7d1
 
 
df83b8b
 
 
 
 
 
fbcb7d1
 
 
 
df83b8b
fbcb7d1
 
df83b8b
fbcb7d1
df83b8b
fbcb7d1
df83b8b
 
 
 
fbcb7d1
df83b8b
 
 
 
fbcb7d1
df83b8b
fbcb7d1
df83b8b
fbcb7d1
df83b8b
fbcb7d1
df83b8b
fbcb7d1
df83b8b
fbcb7d1
 
 
 
df83b8b
bc46fb1
 
 
 
df83b8b
bc46fb1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
df83b8b
 
 
bc46fb1
 
 
 
df83b8b
bc46fb1
fbcb7d1
 
df83b8b
fbcb7d1
df83b8b
 
 
 
fbcb7d1
 
 
 
 
df83b8b
 
fbcb7d1
66282c2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
datasets:
- AImpower/MandarinStutteredSpeech
language:
- zh
metrics:
- wer
base_model:
- openai/whisper-large-v2
tags:
- stuttering
- verbatim
- disfluency
---
# πŸ—£οΈ StutteredSpeechASR Research Demo

A Gradio-based research demonstration showcasing **StutteredSpeechASR**, a Whisper model fine-tuned specifically for stuttered speech recognition (Mandarin). Compare its performance against baseline Whisper models to see the improvement on stuttered speech patterns.

![Python](https://img.shields.io/badge/Python-3.9+-blue.svg)
![Gradio](https://img.shields.io/badge/Gradio-4.0+-orange.svg)
![Research](https://img.shields.io/badge/Research-Demo-green.svg)

## 🎯 Features

- **StutteredSpeechASR Research**: Showcases fine-tuned Whisper model specifically designed for stuttered speech
- **Comparative Analysis**: Side-by-side comparison with baseline Whisper models
- **Audio Input Flexibility**: Record via microphone or upload audio files
- **Specialized for Stuttered Speech**: Better handling of repetitions, prolongations, and blocks
- **Clean Interface**: Organized model cards with clear transcription results
- **Lightweight Deployment**: All inference via Hugging Face APIs - no GPU required

## πŸ€– Models Included

| Model | Type | Description |
|-------|------|-------------|
| πŸ—£οΈ **StutteredSpeechASR** | Fine-tuned Research Model | Whisper fine-tuned specifically for stuttered speech (Mandarin) |
| πŸŽ™οΈ **Whisper Large V3** | Baseline Model | OpenAI's Whisper Large V3 model via HF Inference API |
| πŸ”Š **Whisper Large V3 Turbo** | Baseline Model | OpenAI's Whisper Large V3 Turbo (faster) via HF Inference API |


## πŸ“‹ Requirements

- Python 3.9+
- Hugging Face API key
- Docker (optional, for containerized deployment)

## πŸ”‘ Environment Setup

Create a `.env` file in the project root with your Hugging Face credentials:

```env
HF_ENDPOINT=https://your-endpoint-url.aws.endpoints.huggingface.cloud
HF_API_KEY=hf_your_api_key_here
```

| Variable | Description |
|----------|-------------|
| `HF_ENDPOINT` | Your dedicated Hugging Face Inference Endpoint URL for StutteredSpeechASR |
| `HF_API_KEY` | Your Hugging Face API token (get one at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)) |

## πŸš€ Quick Start

### Option 1: Run with Docker (Recommended)

1. **Create your `.env` file** with HuggingFace credentials (see above)

2. **Build and run with Docker Compose**
   ```bash
   docker compose up --build
   ```

3. **Open your browser** and navigate to `http://localhost:7860`

### Option 2: Run Locally

1. **Clone the repository**
   ```bash
   git clone <your-repo-url>
   cd asr_demo
   ```

2. **Create a virtual environment** (recommended)
   ```bash
   python -m venv venv
   
   # Windows
   venv\Scripts\activate
   
   # Linux/macOS
   source venv/bin/activate
   ```

3. **Install dependencies**
   ```bash
   pip install -r requirements.txt
   ```

4. **Create your `.env` file** with HuggingFace credentials (see Environment Setup above)

5. **Run the application**
   ```bash
   python app.py
   ```

6. **Open your browser** and navigate to `http://localhost:7860`



## πŸ§ͺ Research Notes

- **Target Language**: The StutteredSpeechASR model is specifically trained for Mandarin Chinese
- **Use Cases**: Research demonstration, stuttered speech analysis, comparative ASR evaluation
- **Best Results**: Use clear audio recordings for optimal model performance
- **Baseline Comparison**: The Whisper models may struggle with stuttered speech patterns that StutteredSpeechASR handles well


## πŸ“š References

- [Gradio Documentation](https://www.gradio.app/docs)
- [Hugging Face Inference API](https://huggingface.co/docs/api-inference)
- [Hugging Face Inference Endpoints](https://huggingface.co/docs/inference-endpoints)
- [AImpower StutteredSpeechASR](https://huggingface.co/AImpower/StutteredSpeechASR)
- [OpenAI Whisper](https://github.com/openai/whisper)