--- language: - en license: mit pipeline_tag: text-to-audio tags: - ACE-Step - LoRA - DPO - music-generation - audio-generation - text-to-audio - text2audio - PEFT - acestep-v15-turbo - acestep-5Hz-lm-4B base_model: - ACE-Step/Ace-Step1.5 library_name: peft widget: - text: "Showcase reel" output: url: showcase-training-chapter-v3.mp4 --- # AceStep_Refine_Redmond I'm grateful for the GPU time from Redmond.AI that allowed me to make this model! ## Overview AceStep_Refine_Redmond is a DPO-refined LoRA adapter for ACE-Step 1.5 Turbo, focused on improving musicality, arrangement coherence, and vocal character in practical generation workflows. This release includes: - `standard/` (PEFT adapter for regular ACE-Step loading) - `comfyui/` (single-file ComfyUI-compatible LoRA export) ## Compatibility - DiT used: `acestep-v15-turbo` - Recommended LM for prompting/composition: `acestep-5Hz-lm-4B` - `standard/` works in regular ACE-Step workflows. - `comfyui/` is the converted single-file LoRA for ComfyUI. ## What Changed vs Base In blind A/B testing against the base reference, this refinement achieved about **70% win rate**. The blind test votes were collected from different users. Training summary (final DPO refinement stage): - Base checkpoint: `acestep-v15-turbo` - Adapter type: LoRA - Rank / Alpha: `96 / 192` - Learning rate: `8e-5` - Training path: large-dataset LoRA fine-tune for `75` epochs, then DPO refinement on top of that adapter - Epoch config: up to `81` in the DPO stage (resumed from the previous epoch-75 adapter) ## Known Limitations - Behavior can still vary by prompt style; some sparse prompts may produce less stable vocal timbre. - Very dense arrangements can introduce texture noise or high-frequency harshness in some generations. - This adapter is tuned on a specific preference dataset and may not generalize equally across all genres. ## Responsible Use - Do not use this model to imitate or impersonate real artists without permission. - Respect copyright, voice rights, and local regulations when generating and publishing audio. - Review outputs before public release, especially in commercial workflows.