Instructions to use chunping-m/transcriber with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use chunping-m/transcriber with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("chunping-m/transcriber") model = AutoModelForMultimodalLM.from_pretrained("chunping-m/transcriber") - Notebooks
- Google Colab
- Kaggle
| license: mit | |
| pipeline_tag: audio-text-to-text | |
| library_name: transformers | |
| tags: | |
| - music | |
| - audio | |
| <a href="https://arxiv.org/abs/2602.00744">Tech Report</a> | |
| # ACE-Step Transcriber | |
| ## Description | |
| ACE-Step Transcriber is the annotation model used by **ACE-Step v1.5** for training data labeling. It is a powerful multilingual audio transcription model capable of transcribing both **speech** and **singing voice** with high accuracy. | |
| ### Key Features | |
| - ๐ **50+ Languages Support** - Covers major world languages and regional dialects | |
| - ๐ค **Speech Transcription** - Accurately transcribes spoken content | |
| - ๐ต **Singing Voice Transcription** - Specialized in lyrics transcription with musical structure annotations | |
| - ๐ท๏ธ **Structure Annotation** - Automatically identifies song sections (verse, chorus, bridge, etc.) | |
| ## Usage | |
| The usage is the same as [Qwen2.5 Omni-7B](https://huggingface.co/Qwen/Qwen2.5-Omni-7B). | |
| ### Prompt Format | |
| Use the following prompt to transcribe audio: | |
| ``` | |
| *Task* Transcribe this audio in detail | |
| <audio> | |
| ``` | |
| ### Output Format | |
| The model outputs structured content in the following format: | |
| ``` | |
| # Languages | |
| <language_code> | |
| # Lyrics | |
| [Section Tag - Optional Instrument] | |
| <transcribed content> | |
| ... | |
| ``` | |
| ### Example Output | |
| ``` | |
| # Languages | |
| en | |
| # Lyrics | |
| [Intro - Acoustic Guitar] | |
| [Verse 1] | |
| Walking down the empty street tonight | |
| Stars are shining oh so bright | |
| ... | |
| [Chorus] | |
| This is where we belong | |
| Singing our favorite song | |
| ... | |
| ``` | |
| ### Supported Section Tags | |
| - `[Intro]`, `[Outro]` | |
| - `[Verse 1]`, `[Verse 2]`, etc. | |
| - `[Chorus]`, `[Pre-Chorus]`, `[Post-Chorus]` | |
| - `[Bridge]` | |
| - `[Guitar Interlude]`, `[Instrumental]` | |
| - `[Spoken]` | |
| ### Supported Languages (50+) | |
| The model supports transcription in over 50 languages, including but not limited to: | |
| | Region | Languages | | |
| |--------|-----------| | |
| | **East Asia** | Chinese (zh), Japanese (ja), Korean (ko) | | |
| | **Southeast Asia** | Vietnamese (vi), Thai (th), Indonesian (id), Malay (ms), Filipino (tl) | | |
| | **South Asia** | Hindi (hi), Bengali (bn), Tamil (ta), Urdu (ur) | | |
| | **Europe** | English (en), German (de), French (fr), Spanish (es), Italian (it), Portuguese (pt), Russian (ru), Polish (pl), Dutch (nl), Greek (el), Turkish (tr) | | |
| | **Middle East** | Arabic (ar), Hebrew (he), Persian (fa) | | |
| | **Others** | And many more regional languages... | | |
| ## Use Cases | |
| - **Music Production** - Transcribe reference tracks for lyrics extraction | |
| - **Dataset Creation** - Generate high-quality labeled data for music AI models | |
| - **Accessibility** - Create subtitles and captions for audio content | |
| - **Music Analysis** - Extract structural information from songs |