--- datasets: - AImpower/MandarinStutteredSpeech language: - zh metrics: - wer base_model: - openai/whisper-large-v2 tags: - stuttering - verbatim - disfluency --- # ๐Ÿ—ฃ๏ธ StutteredSpeechASR Research Demo A Gradio-based research demonstration showcasing **StutteredSpeechASR**, a Whisper model fine-tuned specifically for stuttered speech recognition (Mandarin). Compare its performance against baseline Whisper models to see the improvement on stuttered speech patterns. ![Python](https://img.shields.io/badge/Python-3.9+-blue.svg) ![Gradio](https://img.shields.io/badge/Gradio-4.0+-orange.svg) ![Research](https://img.shields.io/badge/Research-Demo-green.svg) ## ๐ŸŽฏ Features - **StutteredSpeechASR Research**: Showcases fine-tuned Whisper model specifically designed for stuttered speech - **Comparative Analysis**: Side-by-side comparison with baseline Whisper models - **Audio Input Flexibility**: Record via microphone or upload audio files - **Specialized for Stuttered Speech**: Better handling of repetitions, prolongations, and blocks - **Clean Interface**: Organized model cards with clear transcription results - **Lightweight Deployment**: All inference via Hugging Face APIs - no GPU required ## ๐Ÿค– Models Included | Model | Type | Description | |-------|------|-------------| | ๐Ÿ—ฃ๏ธ **StutteredSpeechASR** | Fine-tuned Research Model | Whisper fine-tuned specifically for stuttered speech (Mandarin) | | ๐ŸŽ™๏ธ **Whisper Large V3** | Baseline Model | OpenAI's Whisper Large V3 model via HF Inference API | | ๐Ÿ”Š **Whisper Large V3 Turbo** | Baseline Model | OpenAI's Whisper Large V3 Turbo (faster) via HF Inference API | ## ๐Ÿ“‹ Requirements - Python 3.9+ - Hugging Face API key - Docker (optional, for containerized deployment) ## ๐Ÿ”‘ Environment Setup Create a `.env` file in the project root with your Hugging Face credentials: ```env HF_ENDPOINT=https://your-endpoint-url.aws.endpoints.huggingface.cloud HF_API_KEY=hf_your_api_key_here ``` | Variable | Description | |----------|-------------| | `HF_ENDPOINT` | Your dedicated Hugging Face Inference Endpoint URL for StutteredSpeechASR | | `HF_API_KEY` | Your Hugging Face API token (get one at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)) | ## ๐Ÿš€ Quick Start ### Option 1: Run with Docker (Recommended) 1. **Create your `.env` file** with HuggingFace credentials (see above) 2. **Build and run with Docker Compose** ```bash docker compose up --build ``` 3. **Open your browser** and navigate to `http://localhost:7860` ### Option 2: Run Locally 1. **Clone the repository** ```bash git clone cd asr_demo ``` 2. **Create a virtual environment** (recommended) ```bash python -m venv venv # Windows venv\Scripts\activate # Linux/macOS source venv/bin/activate ``` 3. **Install dependencies** ```bash pip install -r requirements.txt ``` 4. **Create your `.env` file** with HuggingFace credentials (see Environment Setup above) 5. **Run the application** ```bash python app.py ``` 6. **Open your browser** and navigate to `http://localhost:7860` ## ๐Ÿงช Research Notes - **Target Language**: The StutteredSpeechASR model is specifically trained for Mandarin Chinese - **Use Cases**: Research demonstration, stuttered speech analysis, comparative ASR evaluation - **Best Results**: Use clear audio recordings for optimal model performance - **Baseline Comparison**: The Whisper models may struggle with stuttered speech patterns that StutteredSpeechASR handles well ## ๐Ÿ“š References - [Gradio Documentation](https://www.gradio.app/docs) - [Hugging Face Inference API](https://huggingface.co/docs/api-inference) - [Hugging Face Inference Endpoints](https://huggingface.co/docs/inference-endpoints) - [AImpower StutteredSpeechASR](https://huggingface.co/AImpower/StutteredSpeechASR) - [OpenAI Whisper](https://github.com/openai/whisper)