ace-step15-endpoint / README.md
Andrew
Add LoRA upload flow in Space UI and update README
048d6f4
metadata
title: ACE-Step 1.5 LoRA Studio
emoji: music
colorFrom: blue
colorTo: teal
sdk: gradio
app_file: app.py
pinned: false

ACE-Step 1.5 LoRA Studio

  • Andrew Rapier

Train ACE-Step 1.5 LoRA adapters, deploy your own Hugging Face Space, and run production-style inference through a Dedicated Endpoint.

Create HF Space Create HF Endpoint Repo License: MIT

What you get

  • LoRA training UI and workflow: app.py, lora_ui.py
  • CLI LoRA trainer for local/HF datasets: lora_train.py
  • Qwen2-Audio captioning/annotation pipeline: qwen_caption_app.py, qwen_audio_captioning.py, scripts/annotations/
  • Audio Flamingo 3 + ChatGPT cleanup pipeline: af3_chatgpt_pipeline.py, scripts/pipeline/, services/pipeline_api.py
  • React orchestration UI for AF3+ChatGPT: react-ui/
  • Custom endpoint runtime: handler.py, acestep/
  • Bootstrap automation for cloning into your HF account: scripts/hf_clone.py
  • Endpoint test clients and HF job launcher: scripts/endpoint/, scripts/jobs/

Quick start (local)

python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python app.py

Open http://localhost:7860.

End-to-end setup (recommended)

Use this sequence when setting up from scratch.

  1. Install dependencies
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
  1. Create local .env from .env.example and fill secrets
HF_TOKEN=hf_xxx
HF_AF3_ENDPOINT_URL=https://YOUR_AF3_ENDPOINT.endpoints.huggingface.cloud
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-5-mini
AF3_MODEL_ID=nvidia/audio-flamingo-3-hf
  1. Bootstrap your Hugging Face repos (Space + endpoint templates)
python scripts/hf_clone.py space --repo-id YOUR_USERNAME/YOUR_SPACE_NAME
python scripts/hf_clone.py af3-nvidia-endpoint --repo-id YOUR_USERNAME/YOUR_AF3_NVIDIA_ENDPOINT_REPO
  1. Deploy endpoint from the cloned AF3 NVIDIA endpoint repo
  • Set endpoint task to custom.
  • Confirm top-level handler.py exists in the endpoint repo.
  • Set endpoint env vars if needed (HF_TOKEN, AF3_NV_DEFAULT_MODE=think).
  1. Generate analysis sidecars from audio
python scripts/pipeline/run_af3_chatgpt_pipeline.py \
  --dataset-dir ./train-dataset \
  --backend hf_endpoint \
  --endpoint-url "$HF_AF3_ENDPOINT_URL" \
  --openai-api-key "$OPENAI_API_KEY"
  1. Normalize existing JSONs into LoRA-ready shape (optional but recommended)
python scripts/pipeline/refine_dataset_json_with_openai.py \
  --dataset-dir ./train-dataset \
  --enable-web-search

This script keeps core fields needed by ACE-Step LoRA training and preserves rich analysis context in source.rich_details.

  1. Train LoRA
python app.py

Then in UI:

  • Load model.
  • Scan/upload dataset.
  • Start LoRA training.
  1. Test generation with your new adapter
  • Use the endpoint scripts in scripts/endpoint/.
  • Or test through the Gradio UI flow.
  • In Step 4 - Evaluate, you can now upload your own LoRA adapter (.zip or adapter files), then load it without retraining in this Space.

AF3 GUI one-command startup

  1. Configure .env (never commit this file):
HF_TOKEN=hf_xxx
HF_AF3_ENDPOINT_URL=https://bc3r76slij67lskb.us-east-1.aws.endpoints.huggingface.cloud
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-5-mini
AF3_MODEL_ID=nvidia/audio-flamingo-3-hf
  1. Launch API + GUI together:
python af3_gui_app.py

PowerShell alternative:

.\scripts\dev\run_af3_gui.ps1

This command builds the React UI and serves it from the FastAPI backend. Open http://127.0.0.1:8008.

Clone to your HF account

Use the two buttons near the top of this README to create target repos in your HF account, then run:

Set token once:

# Linux/macOS
export HF_TOKEN=hf_xxx

# Windows PowerShell
$env:HF_TOKEN="hf_xxx"

Clone your own Space:

python scripts/hf_clone.py space --repo-id YOUR_USERNAME/YOUR_SPACE_NAME

Clone your own Endpoint repo:

python scripts/hf_clone.py endpoint --repo-id YOUR_USERNAME/YOUR_ENDPOINT_REPO

Clone a Qwen2-Audio caption endpoint repo:

python scripts/hf_clone.py qwen-endpoint --repo-id YOUR_USERNAME/YOUR_QWEN_ENDPOINT_REPO

Clone an Audio Flamingo 3 caption endpoint repo:

python scripts/hf_clone.py af3-endpoint --repo-id YOUR_USERNAME/YOUR_AF3_ENDPOINT_REPO

When creating that endpoint, set task to custom so it loads the custom handler.py.

Clone an AF3 NVIDIA-stack endpoint repo (matches NVIDIA Space stack better):

python scripts/hf_clone.py af3-nvidia-endpoint --repo-id YOUR_USERNAME/YOUR_AF3_NVIDIA_ENDPOINT_REPO

Use this path when you want think/long quality behavior closer to NVIDIA's public demo.

Clone both in one run:

python scripts/hf_clone.py all \
  --space-repo-id YOUR_USERNAME/YOUR_SPACE_NAME \
  --endpoint-repo-id YOUR_USERNAME/YOUR_ENDPOINT_REPO

Project layout

.
|- app.py
|- lora_ui.py
|- lora_train.py
|- qwen_caption_app.py
|- qwen_audio_captioning.py
|- af3_chatgpt_pipeline.py
|- af3_gui_app.py
|- handler.py
|- acestep/
|- scripts/
|  |- hf_clone.py
|  |- dev/
|  |  |- run_af3_gui.py
|  |  `- run_af3_gui.ps1
|  |- annotations/
|  |  `- qwen_caption_dataset.py
|  |- pipeline/
|  |  `- run_af3_chatgpt_pipeline.py
|  |- endpoint/
|  |  |- generate_interactive.py
|  |  |- test.ps1
|  |  |- test.bat
|  |  |- test_rnb.bat
|  |  `- test_rnb_2min.bat
|  `- jobs/
|     `- submit_hf_lora_job.ps1
|     `- submit_hf_qwen_caption_job.ps1
|- services/
|  `- pipeline_api.py
|- react-ui/
|- utils/
|  `- env_config.py
|- docs/
|  |- deploy/
|  `- guides/
|- summaries/
|  `- findings.md
`- templates/hf-endpoint/

Dataset format

Supported audio:

  • .wav, .flac, .mp3, .ogg, .opus, .m4a, .aac

Optional sidecar metadata per track:

  • song_001.wav
  • song_001.json
{
  "caption": "melodic emotional rnb pop with warm pads",
  "lyrics": "[Verse]\\n...",
  "bpm": 92,
  "keyscale": "Am",
  "timesignature": "4/4",
  "vocal_language": "en",
  "duration": 120
}

Qwen2-Audio annotation pipeline (music captioning)

Run the dedicated annotation UI:

python qwen_caption_app.py

Batch caption from CLI:

python scripts/annotations/qwen_caption_dataset.py \
  --dataset-dir ./dataset_inbox \
  --backend local \
  --model-id Qwen/Qwen2-Audio-7B-Instruct \
  --output-dir ./qwen_annotations \
  --copy-audio

This also writes .json sidecars next to source audio by default for direct ACE-Step LoRA training.

Then train LoRA on the exported dataset:

python lora_train.py --dataset-dir ./qwen_annotations/dataset --model-config acestep-v15-base

Audio Flamingo 3 + ChatGPT pipeline (analysis -> normalized sidecar JSON)

This stack runs:

  1. Audio Flamingo 3 for raw music analysis prose.
  2. ChatGPT for cleanup/normalization into LoRA-ready fields.
  3. Sidecar JSON export next to each audio file (or in a custom output folder).

CLI single track:

python scripts/pipeline/run_af3_chatgpt_pipeline.py \
  --audio "./train-dataset/Andrew Spacey - Wonder (Prod Beat It AT).mp3" \
  --backend hf_endpoint \
  --endpoint-url "$HF_AF3_ENDPOINT_URL" \
  --hf-token "$HF_TOKEN" \
  --openai-api-key "$OPENAI_API_KEY" \
  --artist-name "Andrew Spacey" \
  --track-name "Wonder"

CLI dataset batch:

python scripts/pipeline/run_af3_chatgpt_pipeline.py \
  --dataset-dir ./train-dataset \
  --backend hf_endpoint \
  --endpoint-url "$HF_AF3_ENDPOINT_URL" \
  --openai-api-key "$OPENAI_API_KEY"

Refine already-generated JSON files in place:

python scripts/pipeline/refine_dataset_json_with_openai.py \
  --dataset-dir ./train-dataset \
  --enable-web-search

Write refined files to a separate folder:

python scripts/pipeline/refine_dataset_json_with_openai.py \
  --dataset-dir ./train-dataset \
  --recursive \
  --enable-web-search \
  --output-dir ./train-dataset-refined

Single-command GUI (recommended):

python af3_gui_app.py

Manual API + React UI:

uvicorn services.pipeline_api:app --host 0.0.0.0 --port 8008 --reload
cd react-ui
npm install
npm run dev

Open http://localhost:5173 (manual) or http://127.0.0.1:8008 (single-command).

Endpoint testing

python scripts/endpoint/generate_interactive.py

Or run scripted tests:

  • scripts/endpoint/test.ps1
  • scripts/endpoint/test.bat

Findings and notes

Current baseline analysis and improvement ideas are tracked in:

  • summaries/findings.md

Docs

  • Space deployment: docs/deploy/SPACE.md
  • Qwen caption Space deployment: docs/deploy/QWEN_SPACE.md
  • Endpoint deployment: docs/deploy/ENDPOINT.md
  • AF3 endpoint deployment: docs/deploy/AF3_ENDPOINT.md
  • AF3 NVIDIA-stack endpoint deployment: docs/deploy/AF3_NVIDIA_ENDPOINT.md
  • Additional guides: docs/guides/qwen2-audio-train.md, docs/guides/af3-chatgpt-pipeline.md

Open-source readiness checklist

  • Secrets are env-driven (HF_TOKEN, HF_AF3_ENDPOINT_URL, OPENAI_API_KEY, .env).
  • Local artifacts are ignored via .gitignore.
  • MIT license included.
  • Reproducible clone/deploy paths documented.
  • .env is git-ignored; keep real credentials only in local .env.

GitHub publish flow

  1. Check status
git status
  1. Stage and commit
git add .
git commit -m "Consolidate AF3/Qwen pipelines, endpoint templates, and docs"
  1. Push to GitHub remote
git push github main