---
title: SenseVoice Audio Transcription
emoji: 🎙️
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 6.0.2
app_file: app.py
pinned: false
---

# Multilingual Audio Transcription with SenseVoice

This application transcribes audio using SenseVoice Small model with multilingual support, providing accurate transcription for Chinese, English, Japanese, Korean, and Cantonese.

## Features

- **Multilingual Support**: Chinese (zh), English (en), Japanese (ja), Korean (ko), Cantonese (yue)
- **Multiple Audio Sources**:
  - Uploaded audio files
  - Direct URLs to audio files (no YouTube support due to cookie requirements)
- **Model Options**:
  - Local SenseVoice model
  - Hugging Face model: `FunAudioLLM/SenseVoiceSmall`
- **Advanced Features**:
  - Audio trimming with start/end time
  - Proxy support for downloads
  - Verbose logging output
  - Automatic inverse text normalization (ITN)

## Model Setup

### For Hugging Face Spaces Deployment
The app is configured to work with:
1. **Local Model**: `"SenseVoiceSmall"` - Model files in the same directory
2. **HF Model**: `"FunAudioLLM/SenseVoiceSmall"` - Auto-downloaded from Hugging Face

### For Local Development
- Update `MODEL_PATH_LIST` in app.py to use your custom models
- Supports local paths and Hugging Face repository names

## How to Use

1. **Upload Audio**: Click "Upload or Record Audio" to select your audio file
2. **Select Model**: Choose from available models in the dropdown
3. **Configure Options**:
   - Set start/end time for audio trimming
   - Enable verbose output for debugging
4. **Transcribe**: Click "Transcribe" to start the process

## Git LFS Setup for Large Models

Since this project uses large model files, Git LFS is recommended:

```bash
# Initialize Git LFS
git lfs install

# Track large model files
git lfs track "*.bin"
git lfs track "*.safetensors"

# Add and commit
git add .gitattributes
git add .
git commit -m "Add SenseVoice model with LFS tracking"
```

## Deployment Notes

### Hugging Face Spaces
- Use `git push huggingface main` to deploy
- Models are automatically cached during runtime
- First load may be slower due to model download

### Model Repository Structure
```
your-repo/
├── app.py
├── README.md
├── requirements.txt
└── SenseVoiceSmall/  # Model directory
    ├── config.json
    ├── model.bin
    └── other model files...
```

## Output

The application provides:
- **Transcription Text**: Full processed transcription with ITN
- **Metrics**: Processing time and file information
- **Download**: Text file with transcription results

## Supported Languages

- 🇨🇳 Chinese (Mandarin)
- 🇺🇸 English
- 🇯🇵 Japanese
- 🇰🇷 Korean
- 🇭🇰 Cantonese

## Feedback and Contributions

Welcome feedback and contributions to improve this multilingual transcription tool.