artificialguybr's picture
Add showcase video and gallery widget to model card
1472005 verified
metadata
language:
  - en
license: mit
pipeline_tag: text-to-audio
tags:
  - ACE-Step
  - LoRA
  - DPO
  - music-generation
  - audio-generation
  - text-to-audio
  - text2audio
  - PEFT
  - acestep-v15-turbo
  - acestep-5Hz-lm-4B
base_model:
  - ACE-Step/Ace-Step1.5
library_name: peft
widget:
  - text: Showcase reel
    output:
      url: showcase-training-chapter-v3.mp4

AceStep_Refine_Redmond

I'm grateful for the GPU time from Redmond.AI that allowed me to make this model!

Prompt
Showcase reel

Overview

AceStep_Refine_Redmond is a DPO-refined LoRA adapter for ACE-Step 1.5 Turbo, focused on improving musicality, arrangement coherence, and vocal character in practical generation workflows.

This release includes:

  • standard/ (PEFT adapter for regular ACE-Step loading)
  • comfyui/ (single-file ComfyUI-compatible LoRA export)

Compatibility

  • DiT used: acestep-v15-turbo
  • Recommended LM for prompting/composition: acestep-5Hz-lm-4B
  • standard/ works in regular ACE-Step workflows.
  • comfyui/ is the converted single-file LoRA for ComfyUI.

What Changed vs Base

In blind A/B testing against the base reference, this refinement achieved about 70% win rate. The blind test votes were collected from different users.

Training summary (final DPO refinement stage):

  • Base checkpoint: acestep-v15-turbo
  • Adapter type: LoRA
  • Rank / Alpha: 96 / 192
  • Learning rate: 8e-5
  • Training path: large-dataset LoRA fine-tune for 75 epochs, then DPO refinement on top of that adapter
  • Epoch config: up to 81 in the DPO stage (resumed from the previous epoch-75 adapter)

Known Limitations

  • Behavior can still vary by prompt style; some sparse prompts may produce less stable vocal timbre.
  • Very dense arrangements can introduce texture noise or high-frequency harshness in some generations.
  • This adapter is tuned on a specific preference dataset and may not generalize equally across all genres.

Responsible Use

  • Do not use this model to imitate or impersonate real artists without permission.
  • Respect copyright, voice rights, and local regulations when generating and publishing audio.
  • Review outputs before public release, especially in commercial workflows.