--- title: SenseVoice Audio Transcription emoji: πŸŽ™οΈ colorFrom: blue colorTo: green sdk: gradio sdk_version: 6.0.2 app_file: app.py pinned: false --- # Multilingual Audio Transcription with SenseVoice This application transcribes audio using SenseVoice Small model with multilingual support, providing accurate transcription for Chinese, English, Japanese, Korean, and Cantonese. ## Features - **Multilingual Support**: Chinese (zh), English (en), Japanese (ja), Korean (ko), Cantonese (yue) - **Multiple Audio Sources**: - Uploaded audio files - Direct URLs to audio files (no YouTube support due to cookie requirements) - **Model Options**: - Local SenseVoice model - Hugging Face model: `FunAudioLLM/SenseVoiceSmall` - **Advanced Features**: - Audio trimming with start/end time - Proxy support for downloads - Verbose logging output - Automatic inverse text normalization (ITN) ## Model Setup ### For Hugging Face Spaces Deployment The app is configured to work with: 1. **Local Model**: `"SenseVoiceSmall"` - Model files in the same directory 2. **HF Model**: `"FunAudioLLM/SenseVoiceSmall"` - Auto-downloaded from Hugging Face ### For Local Development - Update `MODEL_PATH_LIST` in app.py to use your custom models - Supports local paths and Hugging Face repository names ## How to Use 1. **Upload Audio**: Click "Upload or Record Audio" to select your audio file 2. **Select Model**: Choose from available models in the dropdown 3. **Configure Options**: - Set start/end time for audio trimming - Enable verbose output for debugging 4. **Transcribe**: Click "Transcribe" to start the process ## Git LFS Setup for Large Models Since this project uses large model files, Git LFS is recommended: ```bash # Initialize Git LFS git lfs install # Track large model files git lfs track "*.bin" git lfs track "*.safetensors" # Add and commit git add .gitattributes git add . git commit -m "Add SenseVoice model with LFS tracking" ``` ## Deployment Notes ### Hugging Face Spaces - Use `git push huggingface main` to deploy - Models are automatically cached during runtime - First load may be slower due to model download ### Model Repository Structure ``` your-repo/ β”œβ”€β”€ app.py β”œβ”€β”€ README.md β”œβ”€β”€ requirements.txt └── SenseVoiceSmall/ # Model directory β”œβ”€β”€ config.json β”œβ”€β”€ model.bin └── other model files... ``` ## Output The application provides: - **Transcription Text**: Full processed transcription with ITN - **Metrics**: Processing time and file information - **Download**: Text file with transcription results ## Supported Languages - πŸ‡¨πŸ‡³ Chinese (Mandarin) - πŸ‡ΊπŸ‡Έ English - πŸ‡―πŸ‡΅ Japanese - πŸ‡°πŸ‡· Korean - πŸ‡­πŸ‡° Cantonese ## Feedback and Contributions Welcome feedback and contributions to improve this multilingual transcription tool.