AEmotionStudio commited on
Commit
274ea26
·
verified ·
1 Parent(s): 9dfe4be

Mirror README.md from ACE-Step/acestep-captioner

Browse files
checkpoints/acestep-captioner/README.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: transformers
4
+ tags:
5
+ - music
6
+ - audio
7
+ ---
8
+
9
+ <a href="https://arxiv.org/abs/2602.00744">Tech Report</a>
10
+
11
+ # ACE-Step Captioner
12
+
13
+ ## Description
14
+
15
+ ACE-Step Captioner is the annotation model used by **ACE-Step v1.5** for training data labeling. It is a professional-grade music captioning model that generates detailed, structured descriptions of audio content.
16
+
17
+ ### Performance
18
+
19
+ 🏆 **Accuracy surpasses Gemini Pro 2.5** in music description tasks
20
+
21
+ ### Key Features
22
+
23
+ - 🎼 **Musical Style Analysis** - Identifies genres, sub-genres, and stylistic influences
24
+ - 🎸 **Instrument Recognition** - Detects and describes 1000+ instrument types and combinations
25
+ - 🎭 **Structure & Progression** - Analyzes musical arrangement including intro, verse, chorus, bridge, climax, and outro
26
+ - 🔊 **Timbre Description** - Captures tonal qualities, textures, and sonic characteristics
27
+ - 📝 **Rich Vocabulary** - Supports 1000+ descriptive terms for comprehensive music annotation
28
+
29
+ ## Usage
30
+
31
+ The usage is the same as [Qwen2.5 Omni-7B](https://huggingface.co/Qwen/Qwen2.5-Omni-7B).
32
+
33
+ ### Prompt Format
34
+
35
+ Use the following prompt to caption audio:
36
+
37
+ ```
38
+ *Task* Describe this audio in detail
39
+ <audio>
40
+ ```
41
+
42
+ ### Output Format
43
+
44
+ The model generates natural language descriptions covering multiple aspects of the music.
45
+
46
+ ### Example Output
47
+
48
+ ```
49
+ A melancholic indie folk track featuring fingerpicked acoustic guitar
50
+ as the primary instrument. The song opens with a sparse, contemplative
51
+ intro before the vocals enter with a breathy, intimate delivery.
52
+ The arrangement gradually builds through the verse, adding subtle
53
+ string pads and a gentle kick drum. The chorus lifts with layered
54
+ harmonies and a warmer, fuller texture. The bridge introduces a
55
+ key change and emotional climax before returning to the stripped-down
56
+ acoustic arrangement for the outro.
57
+ ```
58
+
59
+ ## Descriptive Capabilities
60
+
61
+ ### Musical Styles (Examples)
62
+
63
+ | Category | Styles |
64
+ |----------|--------|
65
+ | **Electronic** | Ambient, Techno, House, Drum & Bass, Synthwave, IDM, Downtempo |
66
+ | **Rock** | Alternative, Indie, Post-Rock, Progressive, Psychedelic, Grunge |
67
+ | **Pop** | Synth-pop, Electropop, Dream Pop, Art Pop, Indie Pop |
68
+ | **Classical** | Orchestral, Chamber, Minimalist, Neo-Classical, Cinematic |
69
+ | **World** | Latin, African, Middle Eastern, Asian Traditional, Celtic |
70
+ | **Jazz** | Fusion, Smooth, Bebop, Modal, Free Jazz |
71
+ | **Hip-Hop** | Trap, Boom Bap, Lo-fi, Instrumental, Cloud Rap |
72
+
73
+ ### Instruments (1000+ Supported)
74
+
75
+ | Category | Examples |
76
+ |----------|----------|
77
+ | **Strings** | Acoustic Guitar, Electric Guitar, Violin, Cello, Bass, Harp, Mandolin |
78
+ | **Keys** | Piano, Synthesizer, Organ, Rhodes, Wurlitzer, Mellotron |
79
+ | **Percussion** | Drums, Electronic Drums, Congas, Bongos, Timpani, Vibraphone |
80
+ | **Wind** | Saxophone, Trumpet, Flute, Clarinet, Oboe, French Horn |
81
+ | **Electronic** | Synth Bass, Pad, Lead, Arpeggiator, Sampler, 808, 303 |
82
+
83
+ ### Structure Analysis
84
+
85
+ - **Intro / Outro** - Opening and closing sections
86
+ - **Verse / Pre-Chorus / Chorus** - Main song structure
87
+ - **Bridge / Break** - Transitional sections
88
+ - **Build-up / Drop / Climax** - Dynamic progression
89
+ - **Interlude / Solo** - Instrumental passages
90
+
91
+ ### Timbre Descriptions
92
+
93
+ | Dimension | Descriptors |
94
+ |-----------|-------------|
95
+ | **Texture** | Warm, Bright, Dark, Crisp, Muddy, Clean, Distorted, Saturated |
96
+ | **Space** | Reverberant, Dry, Spacious, Intimate, Cavernous, Tight |
97
+ | **Dynamics** | Punchy, Soft, Aggressive, Gentle, Compressed, Dynamic |
98
+ | **Character** | Ethereal, Gritty, Smooth, Raw, Polished, Organic, Synthetic |
99
+
100
+ ## Use Cases
101
+
102
+ - **Music AI Training** - Generate high-quality captions for music generation models
103
+ - **Music Information Retrieval** - Create searchable metadata for audio databases
104
+ - **Content Moderation** - Analyze and categorize music content
105
+ - **Music Education** - Provide detailed analysis for learning purposes
106
+ - **Audio Production** - Document and describe sound design elements