AEmotionStudio commited on
Commit
b51f034
·
verified ·
1 Parent(s): 728c4a0

Mirror README.md from ACE-Step/acestep-transcriber

Browse files
checkpoints/acestep-transcriber/README.md ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: audio-text-to-text
4
+ library_name: transformers
5
+ tags:
6
+ - music
7
+ - audio
8
+ ---
9
+
10
+ <a href="https://arxiv.org/abs/2602.00744">Tech Report</a>
11
+
12
+ # ACE-Step Transcriber
13
+
14
+ ## Description
15
+
16
+ ACE-Step Transcriber is the annotation model used by **ACE-Step v1.5** for training data labeling. It is a powerful multilingual audio transcription model capable of transcribing both **speech** and **singing voice** with high accuracy.
17
+
18
+ ### Key Features
19
+
20
+ - 🌍 **50+ Languages Support** - Covers major world languages and regional dialects
21
+ - 🎤 **Speech Transcription** - Accurately transcribes spoken content
22
+ - 🎵 **Singing Voice Transcription** - Specialized in lyrics transcription with musical structure annotations
23
+ - 🏷️ **Structure Annotation** - Automatically identifies song sections (verse, chorus, bridge, etc.)
24
+
25
+ ## Usage
26
+
27
+ The usage is the same as [Qwen2.5 Omni-7B](https://huggingface.co/Qwen/Qwen2.5-Omni-7B).
28
+
29
+ ### Prompt Format
30
+
31
+ Use the following prompt to transcribe audio:
32
+
33
+ ```
34
+ *Task* Transcribe this audio in detail
35
+ <audio>
36
+ ```
37
+
38
+ ### Output Format
39
+
40
+ The model outputs structured content in the following format:
41
+
42
+ ```
43
+ # Languages
44
+ <language_code>
45
+
46
+ # Lyrics
47
+ [Section Tag - Optional Instrument]
48
+
49
+ <transcribed content>
50
+ ...
51
+ ```
52
+
53
+ ### Example Output
54
+
55
+ ```
56
+ # Languages
57
+ en
58
+
59
+ # Lyrics
60
+ [Intro - Acoustic Guitar]
61
+
62
+ [Verse 1]
63
+ Walking down the empty street tonight
64
+ Stars are shining oh so bright
65
+ ...
66
+
67
+ [Chorus]
68
+ This is where we belong
69
+ Singing our favorite song
70
+ ...
71
+ ```
72
+
73
+ ### Supported Section Tags
74
+
75
+ - `[Intro]`, `[Outro]`
76
+ - `[Verse 1]`, `[Verse 2]`, etc.
77
+ - `[Chorus]`, `[Pre-Chorus]`, `[Post-Chorus]`
78
+ - `[Bridge]`
79
+ - `[Guitar Interlude]`, `[Instrumental]`
80
+ - `[Spoken]`
81
+
82
+ ### Supported Languages (50+)
83
+
84
+ The model supports transcription in over 50 languages, including but not limited to:
85
+
86
+ | Region | Languages |
87
+ |--------|-----------|
88
+ | **East Asia** | Chinese (zh), Japanese (ja), Korean (ko) |
89
+ | **Southeast Asia** | Vietnamese (vi), Thai (th), Indonesian (id), Malay (ms), Filipino (tl) |
90
+ | **South Asia** | Hindi (hi), Bengali (bn), Tamil (ta), Urdu (ur) |
91
+ | **Europe** | English (en), German (de), French (fr), Spanish (es), Italian (it), Portuguese (pt), Russian (ru), Polish (pl), Dutch (nl), Greek (el), Turkish (tr) |
92
+ | **Middle East** | Arabic (ar), Hebrew (he), Persian (fa) |
93
+ | **Others** | And many more regional languages... |
94
+
95
+ ## Use Cases
96
+
97
+ - **Music Production** - Transcribe reference tracks for lyrics extraction
98
+ - **Dataset Creation** - Generate high-quality labeled data for music AI models
99
+ - **Accessibility** - Create subtitles and captions for audio content
100
+ - **Music Analysis** - Extract structural information from songs