| | --- |
| | datasets: |
| | - AImpower/MandarinStutteredSpeech |
| | language: |
| | - zh |
| | metrics: |
| | - wer |
| | base_model: |
| | - openai/whisper-large-v2 |
| | tags: |
| | - stuttering |
| | - verbatim |
| | - disfluency |
| | --- |
| | # π£οΈ StutteredSpeechASR Research Demo |
| |
|
| | A Gradio-based research demonstration showcasing **StutteredSpeechASR**, a Whisper model fine-tuned specifically for stuttered speech recognition (Mandarin). Compare its performance against baseline Whisper models to see the improvement on stuttered speech patterns. |
| |
|
| |  |
| |  |
| |  |
| |
|
| | ## π― Features |
| |
|
| | - **StutteredSpeechASR Research**: Showcases fine-tuned Whisper model specifically designed for stuttered speech |
| | - **Comparative Analysis**: Side-by-side comparison with baseline Whisper models |
| | - **Audio Input Flexibility**: Record via microphone or upload audio files |
| | - **Specialized for Stuttered Speech**: Better handling of repetitions, prolongations, and blocks |
| | - **Clean Interface**: Organized model cards with clear transcription results |
| | - **Lightweight Deployment**: All inference via Hugging Face APIs - no GPU required |
| |
|
| | ## π€ Models Included |
| |
|
| | | Model | Type | Description | |
| | |-------|------|-------------| |
| | | π£οΈ **StutteredSpeechASR** | Fine-tuned Research Model | Whisper fine-tuned specifically for stuttered speech (Mandarin) | |
| | | ποΈ **Whisper Large V3** | Baseline Model | OpenAI's Whisper Large V3 model via HF Inference API | |
| | | π **Whisper Large V3 Turbo** | Baseline Model | OpenAI's Whisper Large V3 Turbo (faster) via HF Inference API | |
| |
|
| |
|
| | ## π Requirements |
| |
|
| | - Python 3.9+ |
| | - Hugging Face API key |
| | - Docker (optional, for containerized deployment) |
| |
|
| | ## π Environment Setup |
| |
|
| | Create a `.env` file in the project root with your Hugging Face credentials: |
| |
|
| | ```env |
| | HF_ENDPOINT=https://your-endpoint-url.aws.endpoints.huggingface.cloud |
| | HF_API_KEY=hf_your_api_key_here |
| | ``` |
| |
|
| | | Variable | Description | |
| | |----------|-------------| |
| | | `HF_ENDPOINT` | Your dedicated Hugging Face Inference Endpoint URL for StutteredSpeechASR | |
| | | `HF_API_KEY` | Your Hugging Face API token (get one at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)) | |
| |
|
| | ## π Quick Start |
| |
|
| | ### Option 1: Run with Docker (Recommended) |
| |
|
| | 1. **Create your `.env` file** with HuggingFace credentials (see above) |
| |
|
| | 2. **Build and run with Docker Compose** |
| | ```bash |
| | docker compose up --build |
| | ``` |
| |
|
| | 3. **Open your browser** and navigate to `http://localhost:7860` |
| |
|
| | ### Option 2: Run Locally |
| |
|
| | 1. **Clone the repository** |
| | ```bash |
| | git clone <your-repo-url> |
| | cd asr_demo |
| | ``` |
| |
|
| | 2. **Create a virtual environment** (recommended) |
| | ```bash |
| | python -m venv venv |
| | |
| | # Windows |
| | venv\Scripts\activate |
| | |
| | # Linux/macOS |
| | source venv/bin/activate |
| | ``` |
| |
|
| | 3. **Install dependencies** |
| | ```bash |
| | pip install -r requirements.txt |
| | ``` |
| |
|
| | 4. **Create your `.env` file** with HuggingFace credentials (see Environment Setup above) |
| |
|
| | 5. **Run the application** |
| | ```bash |
| | python app.py |
| | ``` |
| |
|
| | 6. **Open your browser** and navigate to `http://localhost:7860` |
| |
|
| |
|
| |
|
| | ## π§ͺ Research Notes |
| |
|
| | - **Target Language**: The StutteredSpeechASR model is specifically trained for Mandarin Chinese |
| | - **Use Cases**: Research demonstration, stuttered speech analysis, comparative ASR evaluation |
| | - **Best Results**: Use clear audio recordings for optimal model performance |
| | - **Baseline Comparison**: The Whisper models may struggle with stuttered speech patterns that StutteredSpeechASR handles well |
| |
|
| |
|
| | ## π References |
| |
|
| | - [Gradio Documentation](https://www.gradio.app/docs) |
| | - [Hugging Face Inference API](https://huggingface.co/docs/api-inference) |
| | - [Hugging Face Inference Endpoints](https://huggingface.co/docs/inference-endpoints) |
| | - [AImpower StutteredSpeechASR](https://huggingface.co/AImpower/StutteredSpeechASR) |
| | - [OpenAI Whisper](https://github.com/openai/whisper) |