| | --- |
| | title: Modal Transcriber MCP |
| | emoji: ποΈ |
| | colorFrom: blue |
| | colorTo: purple |
| | sdk: docker |
| | app_port: 7860 |
| | pinned: false |
| | license: mit |
| | tag: mcp-server-track |
| | --- |
| | |
| | # ποΈ Modal Transcriber MCP |
| |
|
| | A powerful audio transcription system integrating Gradio UI, FastMCP Tools, and Modal cloud computing with intelligent speaker identification. |
| |
|
| | ## β¨ Key Features |
| |
|
| | - **π΅ Multi-platform Audio Download**: Support for Apple Podcasts, XiaoYuZhou, and other podcast platforms |
| | - **π High-performance Transcription**: Based on OpenAI Whisper with multiple model support (turbo, large-v3, etc.) |
| | - **π€ Intelligent Speaker Identification**: Using pyannote.audio for speaker separation and embedding clustering |
| | - **β‘ Distributed Processing**: Support for large file concurrent chunk processing, significantly improving processing speed |
| | - **π§ FastMCP Tools**: Complete MCP (Model Context Protocol) tool integration |
| | - **βοΈ Modal Deployment**: Support for both local and cloud deployment modes |
| |
|
| | ## π― Core Advantages |
| |
|
| | ### π§ Intelligent Audio Segmentation |
| | - **Silence Detection Segmentation**: Automatically identify silent segments in audio for intelligent chunking |
| | - **Fallback Mechanism**: Long audio automatically degrades to time-based segmentation, ensuring processing efficiency |
| | - **Concurrent Processing**: Multiple chunks processed simultaneously, dramatically improving transcription speed |
| |
|
| | ### π€ Advanced Speaker Identification |
| | - **Embedding Clustering**: Using deep learning embeddings for speaker consistency identification |
| | - **Cross-chunk Unification**: Solving speaker label inconsistency issues in distributed processing |
| | - **Quality Filtering**: Automatically filter low-quality segments to improve output accuracy |
| |
|
| | ### π§ Developer Friendly |
| | - **MCP Protocol Support**: Complete tool invocation interface |
| | - **REST API**: Standardized API interface |
| | - **Gradio UI**: Intuitive web interface |
| | - **Test Coverage**: 29 unit tests and integration tests |
| |
|
| | ## π Quick Start |
| |
|
| | ### Local Setup |
| |
|
| | 1. **Clone Repository** |
| | ```bash |
| | git clone https://huggingface.co/spaces/Agents-MCP-Hackathon/ModalTranscriberMCP |
| | cd ModalTranscriberMCP |
| | ``` |
| |
|
| | 2. **Install Dependencies** |
| | ```bash |
| | pip install -r requirements.txt |
| | ``` |
| |
|
| | 3. **Configure Hugging Face Token** (Optional, for speaker identification) |
| | ```bash |
| | # Create .env file |
| | echo "HF_TOKEN=your_huggingface_token_here" > .env |
| | ``` |
| |
|
| | 4. **Start Application** |
| | ```bash |
| | python app.py |
| | ``` |
| |
|
| | ### Usage Instructions |
| |
|
| | 1. **Upload audio file** or **Input podcast URL** |
| | 2. **Select transcription options**: |
| | - Model size: turbo (recommended) / large-v3 |
| | - Output format: SRT / TXT |
| | - Enable speaker identification |
| | 3. **Start transcription**, the system will automatically process and generate results |
| |
|
| | ## π οΈ Technical Architecture |
| |
|
| | - **Frontend**: Gradio 4.44.0 |
| | - **Backend**: FastAPI + FastMCP |
| | - **Transcription Engine**: OpenAI Whisper |
| | - **Speaker Identification**: pyannote.audio |
| | - **Cloud Computing**: Modal.com |
| | - **Audio Processing**: FFmpeg |
| |
|
| | ## π Performance Metrics |
| |
|
| | - **Processing Speed**: Support for 30x real-time transcription speed |
| | - **Concurrency**: Up to 10 chunks processed simultaneously |
| | - **Accuracy**: Chinese accuracy >95% |
| | - **Supported Formats**: MP3, WAV, M4A, FLAC, etc. |
| |
|
| | ## π€ Contributing |
| |
|
| | Issues and Pull Requests are welcome! |
| |
|
| | ## π License |
| |
|
| | MIT License |
| |
|
| | ## π Related Links |
| |
|
| | - **Project Documentation**: See `docs/` directory in the repository |
| | - **Test Coverage**: 29 test cases ensuring functional stability |
| | - **Modal Deployment**: Support for cloud high-performance processing |
| |
|
| | --- |
| | *Last updated: 2025-06-11* |