Add showcase video and gallery widget to model card

1472005 verified 3 days ago

2.18 kB

language:
  - en
license: mit
pipeline_tag: text-to-audio
tags:
  - ACE-Step
  - LoRA
  - DPO
  - music-generation
  - audio-generation
  - text-to-audio
  - text2audio
  - PEFT
  - acestep-v15-turbo
  - acestep-5Hz-lm-4B
base_model:
  - ACE-Step/Ace-Step1.5
library_name: peft
widget:
  - text: Showcase reel
    output:
      url: showcase-training-chapter-v3.mp4

AceStep_Refine_Redmond

I'm grateful for the GPU time from Redmond.AI that allowed me to make this model!

Prompt: Showcase reel

Overview

AceStep_Refine_Redmond is a DPO-refined LoRA adapter for ACE-Step 1.5 Turbo, focused on improving musicality, arrangement coherence, and vocal character in practical generation workflows.

This release includes:

standard/ (PEFT adapter for regular ACE-Step loading)
comfyui/ (single-file ComfyUI-compatible LoRA export)

Compatibility

DiT used: acestep-v15-turbo
Recommended LM for prompting/composition: acestep-5Hz-lm-4B
standard/ works in regular ACE-Step workflows.
comfyui/ is the converted single-file LoRA for ComfyUI.

What Changed vs Base

In blind A/B testing against the base reference, this refinement achieved about 70% win rate. The blind test votes were collected from different users.

Training summary (final DPO refinement stage):

Base checkpoint: acestep-v15-turbo
Adapter type: LoRA
Rank / Alpha: 96 / 192
Learning rate: 8e-5
Training path: large-dataset LoRA fine-tune for 75 epochs, then DPO refinement on top of that adapter
Epoch config: up to 81 in the DPO stage (resumed from the previous epoch-75 adapter)

Known Limitations

Behavior can still vary by prompt style; some sparse prompts may produce less stable vocal timbre.
Very dense arrangements can introduce texture noise or high-frequency harshness in some generations.
This adapter is tuned on a specific preference dataset and may not generalize equally across all genres.

Responsible Use

Do not use this model to imitate or impersonate real artists without permission.
Respect copyright, voice rights, and local regulations when generating and publishing audio.
Review outputs before public release, especially in commercial workflows.