Instructions to use AEmotionStudio/acestep-models with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use AEmotionStudio/acestep-models with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("AEmotionStudio/acestep-models", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| license: mit | |
| library_name: transformers | |
| tags: | |
| - music | |
| - audio | |
| <a href="https://arxiv.org/abs/2602.00744">Tech Report</a> | |
| # ACE-Step Captioner | |
| ## Description | |
| ACE-Step Captioner is the annotation model used by **ACE-Step v1.5** for training data labeling. It is a professional-grade music captioning model that generates detailed, structured descriptions of audio content. | |
| ### Performance | |
| 🏆 **Accuracy surpasses Gemini Pro 2.5** in music description tasks | |
| ### Key Features | |
| - 🎼 **Musical Style Analysis** - Identifies genres, sub-genres, and stylistic influences | |
| - 🎸 **Instrument Recognition** - Detects and describes 1000+ instrument types and combinations | |
| - 🎭 **Structure & Progression** - Analyzes musical arrangement including intro, verse, chorus, bridge, climax, and outro | |
| - 🔊 **Timbre Description** - Captures tonal qualities, textures, and sonic characteristics | |
| - 📝 **Rich Vocabulary** - Supports 1000+ descriptive terms for comprehensive music annotation | |
| ## Usage | |
| The usage is the same as [Qwen2.5 Omni-7B](https://huggingface.co/Qwen/Qwen2.5-Omni-7B). | |
| ### Prompt Format | |
| Use the following prompt to caption audio: | |
| ``` | |
| *Task* Describe this audio in detail | |
| <audio> | |
| ``` | |
| ### Output Format | |
| The model generates natural language descriptions covering multiple aspects of the music. | |
| ### Example Output | |
| ``` | |
| A melancholic indie folk track featuring fingerpicked acoustic guitar | |
| as the primary instrument. The song opens with a sparse, contemplative | |
| intro before the vocals enter with a breathy, intimate delivery. | |
| The arrangement gradually builds through the verse, adding subtle | |
| string pads and a gentle kick drum. The chorus lifts with layered | |
| harmonies and a warmer, fuller texture. The bridge introduces a | |
| key change and emotional climax before returning to the stripped-down | |
| acoustic arrangement for the outro. | |
| ``` | |
| ## Descriptive Capabilities | |
| ### Musical Styles (Examples) | |
| | Category | Styles | | |
| |----------|--------| | |
| | **Electronic** | Ambient, Techno, House, Drum & Bass, Synthwave, IDM, Downtempo | | |
| | **Rock** | Alternative, Indie, Post-Rock, Progressive, Psychedelic, Grunge | | |
| | **Pop** | Synth-pop, Electropop, Dream Pop, Art Pop, Indie Pop | | |
| | **Classical** | Orchestral, Chamber, Minimalist, Neo-Classical, Cinematic | | |
| | **World** | Latin, African, Middle Eastern, Asian Traditional, Celtic | | |
| | **Jazz** | Fusion, Smooth, Bebop, Modal, Free Jazz | | |
| | **Hip-Hop** | Trap, Boom Bap, Lo-fi, Instrumental, Cloud Rap | | |
| ### Instruments (1000+ Supported) | |
| | Category | Examples | | |
| |----------|----------| | |
| | **Strings** | Acoustic Guitar, Electric Guitar, Violin, Cello, Bass, Harp, Mandolin | | |
| | **Keys** | Piano, Synthesizer, Organ, Rhodes, Wurlitzer, Mellotron | | |
| | **Percussion** | Drums, Electronic Drums, Congas, Bongos, Timpani, Vibraphone | | |
| | **Wind** | Saxophone, Trumpet, Flute, Clarinet, Oboe, French Horn | | |
| | **Electronic** | Synth Bass, Pad, Lead, Arpeggiator, Sampler, 808, 303 | | |
| ### Structure Analysis | |
| - **Intro / Outro** - Opening and closing sections | |
| - **Verse / Pre-Chorus / Chorus** - Main song structure | |
| - **Bridge / Break** - Transitional sections | |
| - **Build-up / Drop / Climax** - Dynamic progression | |
| - **Interlude / Solo** - Instrumental passages | |
| ### Timbre Descriptions | |
| | Dimension | Descriptors | | |
| |-----------|-------------| | |
| | **Texture** | Warm, Bright, Dark, Crisp, Muddy, Clean, Distorted, Saturated | | |
| | **Space** | Reverberant, Dry, Spacious, Intimate, Cavernous, Tight | | |
| | **Dynamics** | Punchy, Soft, Aggressive, Gentle, Compressed, Dynamic | | |
| | **Character** | Ethereal, Gritty, Smooth, Raw, Polished, Organic, Synthetic | | |
| ## Use Cases | |
| - **Music AI Training** - Generate high-quality captions for music generation models | |
| - **Music Information Retrieval** - Create searchable metadata for audio databases | |
| - **Content Moderation** - Analyze and categorize music content | |
| - **Music Education** - Provide detailed analysis for learning purposes | |
| - **Audio Production** - Document and describe sound design elements |