title: ACE-Step 1.5 LoRA Studio
emoji: music
colorFrom: blue
colorTo: teal
sdk: gradio
app_file: app.py
pinned: false
ACE-Step 1.5 LoRA Studio
- Andrew Rapier
Train ACE-Step 1.5 LoRA adapters, deploy your own Hugging Face Space, and run production-style inference through a Dedicated Endpoint.
What you get
- LoRA training UI and workflow:
app.py,lora_ui.py - CLI LoRA trainer for local/HF datasets:
lora_train.py - Qwen2-Audio captioning/annotation pipeline:
qwen_caption_app.py,qwen_audio_captioning.py,scripts/annotations/ - Audio Flamingo 3 + ChatGPT cleanup pipeline:
af3_chatgpt_pipeline.py,scripts/pipeline/,services/pipeline_api.py - React orchestration UI for AF3+ChatGPT:
react-ui/ - Custom endpoint runtime:
handler.py,acestep/ - Bootstrap automation for cloning into your HF account:
scripts/hf_clone.py - Endpoint test clients and HF job launcher:
scripts/endpoint/,scripts/jobs/
Quick start (local)
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python app.py
Open http://localhost:7860.
End-to-end setup (recommended)
Use this sequence when setting up from scratch.
- Install dependencies
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
- Create local
.envfrom.env.exampleand fill secrets
HF_TOKEN=hf_xxx
HF_AF3_ENDPOINT_URL=https://YOUR_AF3_ENDPOINT.endpoints.huggingface.cloud
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-5-mini
AF3_MODEL_ID=nvidia/audio-flamingo-3-hf
- Bootstrap your Hugging Face repos (Space + endpoint templates)
python scripts/hf_clone.py space --repo-id YOUR_USERNAME/YOUR_SPACE_NAME
python scripts/hf_clone.py af3-nvidia-endpoint --repo-id YOUR_USERNAME/YOUR_AF3_NVIDIA_ENDPOINT_REPO
- Deploy endpoint from the cloned AF3 NVIDIA endpoint repo
- Set endpoint task to
custom. - Confirm top-level
handler.pyexists in the endpoint repo. - Set endpoint env vars if needed (
HF_TOKEN,AF3_NV_DEFAULT_MODE=think).
- Generate analysis sidecars from audio
python scripts/pipeline/run_af3_chatgpt_pipeline.py \
--dataset-dir ./train-dataset \
--backend hf_endpoint \
--endpoint-url "$HF_AF3_ENDPOINT_URL" \
--openai-api-key "$OPENAI_API_KEY"
- Normalize existing JSONs into LoRA-ready shape (optional but recommended)
python scripts/pipeline/refine_dataset_json_with_openai.py \
--dataset-dir ./train-dataset \
--enable-web-search
This script keeps core fields needed by ACE-Step LoRA training and preserves rich analysis context in source.rich_details.
- Train LoRA
python app.py
Then in UI:
- Load model.
- Scan/upload dataset.
- Start LoRA training.
- Test generation with your new adapter
- Use the endpoint scripts in
scripts/endpoint/. - Or test through the Gradio UI flow.
- In Step 4 - Evaluate, you can now upload your own LoRA adapter (
.zipor adapter files), then load it without retraining in this Space.
AF3 GUI one-command startup
- Configure
.env(never commit this file):
HF_TOKEN=hf_xxx
HF_AF3_ENDPOINT_URL=https://bc3r76slij67lskb.us-east-1.aws.endpoints.huggingface.cloud
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-5-mini
AF3_MODEL_ID=nvidia/audio-flamingo-3-hf
- Launch API + GUI together:
python af3_gui_app.py
PowerShell alternative:
.\scripts\dev\run_af3_gui.ps1
This command builds the React UI and serves it from the FastAPI backend.
Open http://127.0.0.1:8008.
Clone to your HF account
Use the two buttons near the top of this README to create target repos in your HF account, then run:
Set token once:
# Linux/macOS
export HF_TOKEN=hf_xxx
# Windows PowerShell
$env:HF_TOKEN="hf_xxx"
Clone your own Space:
python scripts/hf_clone.py space --repo-id YOUR_USERNAME/YOUR_SPACE_NAME
Clone your own Endpoint repo:
python scripts/hf_clone.py endpoint --repo-id YOUR_USERNAME/YOUR_ENDPOINT_REPO
Clone a Qwen2-Audio caption endpoint repo:
python scripts/hf_clone.py qwen-endpoint --repo-id YOUR_USERNAME/YOUR_QWEN_ENDPOINT_REPO
Clone an Audio Flamingo 3 caption endpoint repo:
python scripts/hf_clone.py af3-endpoint --repo-id YOUR_USERNAME/YOUR_AF3_ENDPOINT_REPO
When creating that endpoint, set task to custom so it loads the custom handler.py.
Clone an AF3 NVIDIA-stack endpoint repo (matches NVIDIA Space stack better):
python scripts/hf_clone.py af3-nvidia-endpoint --repo-id YOUR_USERNAME/YOUR_AF3_NVIDIA_ENDPOINT_REPO
Use this path when you want think/long quality behavior closer to NVIDIA's public demo.
Clone both in one run:
python scripts/hf_clone.py all \
--space-repo-id YOUR_USERNAME/YOUR_SPACE_NAME \
--endpoint-repo-id YOUR_USERNAME/YOUR_ENDPOINT_REPO
Project layout
.
|- app.py
|- lora_ui.py
|- lora_train.py
|- qwen_caption_app.py
|- qwen_audio_captioning.py
|- af3_chatgpt_pipeline.py
|- af3_gui_app.py
|- handler.py
|- acestep/
|- scripts/
| |- hf_clone.py
| |- dev/
| | |- run_af3_gui.py
| | `- run_af3_gui.ps1
| |- annotations/
| | `- qwen_caption_dataset.py
| |- pipeline/
| | `- run_af3_chatgpt_pipeline.py
| |- endpoint/
| | |- generate_interactive.py
| | |- test.ps1
| | |- test.bat
| | |- test_rnb.bat
| | `- test_rnb_2min.bat
| `- jobs/
| `- submit_hf_lora_job.ps1
| `- submit_hf_qwen_caption_job.ps1
|- services/
| `- pipeline_api.py
|- react-ui/
|- utils/
| `- env_config.py
|- docs/
| |- deploy/
| `- guides/
|- summaries/
| `- findings.md
`- templates/hf-endpoint/
Dataset format
Supported audio:
.wav,.flac,.mp3,.ogg,.opus,.m4a,.aac
Optional sidecar metadata per track:
song_001.wavsong_001.json
{
"caption": "melodic emotional rnb pop with warm pads",
"lyrics": "[Verse]\\n...",
"bpm": 92,
"keyscale": "Am",
"timesignature": "4/4",
"vocal_language": "en",
"duration": 120
}
Qwen2-Audio annotation pipeline (music captioning)
Run the dedicated annotation UI:
python qwen_caption_app.py
Batch caption from CLI:
python scripts/annotations/qwen_caption_dataset.py \
--dataset-dir ./dataset_inbox \
--backend local \
--model-id Qwen/Qwen2-Audio-7B-Instruct \
--output-dir ./qwen_annotations \
--copy-audio
This also writes .json sidecars next to source audio by default for direct ACE-Step LoRA training.
Then train LoRA on the exported dataset:
python lora_train.py --dataset-dir ./qwen_annotations/dataset --model-config acestep-v15-base
Audio Flamingo 3 + ChatGPT pipeline (analysis -> normalized sidecar JSON)
This stack runs:
- Audio Flamingo 3 for raw music analysis prose.
- ChatGPT for cleanup/normalization into LoRA-ready fields.
- Sidecar JSON export next to each audio file (or in a custom output folder).
CLI single track:
python scripts/pipeline/run_af3_chatgpt_pipeline.py \
--audio "./train-dataset/Andrew Spacey - Wonder (Prod Beat It AT).mp3" \
--backend hf_endpoint \
--endpoint-url "$HF_AF3_ENDPOINT_URL" \
--hf-token "$HF_TOKEN" \
--openai-api-key "$OPENAI_API_KEY" \
--artist-name "Andrew Spacey" \
--track-name "Wonder"
CLI dataset batch:
python scripts/pipeline/run_af3_chatgpt_pipeline.py \
--dataset-dir ./train-dataset \
--backend hf_endpoint \
--endpoint-url "$HF_AF3_ENDPOINT_URL" \
--openai-api-key "$OPENAI_API_KEY"
Refine already-generated JSON files in place:
python scripts/pipeline/refine_dataset_json_with_openai.py \
--dataset-dir ./train-dataset \
--enable-web-search
Write refined files to a separate folder:
python scripts/pipeline/refine_dataset_json_with_openai.py \
--dataset-dir ./train-dataset \
--recursive \
--enable-web-search \
--output-dir ./train-dataset-refined
Single-command GUI (recommended):
python af3_gui_app.py
Manual API + React UI:
uvicorn services.pipeline_api:app --host 0.0.0.0 --port 8008 --reload
cd react-ui
npm install
npm run dev
Open http://localhost:5173 (manual) or http://127.0.0.1:8008 (single-command).
Endpoint testing
python scripts/endpoint/generate_interactive.py
Or run scripted tests:
scripts/endpoint/test.ps1scripts/endpoint/test.bat
Findings and notes
Current baseline analysis and improvement ideas are tracked in:
summaries/findings.md
Docs
- Space deployment:
docs/deploy/SPACE.md - Qwen caption Space deployment:
docs/deploy/QWEN_SPACE.md - Endpoint deployment:
docs/deploy/ENDPOINT.md - AF3 endpoint deployment:
docs/deploy/AF3_ENDPOINT.md - AF3 NVIDIA-stack endpoint deployment:
docs/deploy/AF3_NVIDIA_ENDPOINT.md - Additional guides:
docs/guides/qwen2-audio-train.md,docs/guides/af3-chatgpt-pipeline.md
Open-source readiness checklist
- Secrets are env-driven (
HF_TOKEN,HF_AF3_ENDPOINT_URL,OPENAI_API_KEY,.env). - Local artifacts are ignored via
.gitignore. - MIT license included.
- Reproducible clone/deploy paths documented.
.envis git-ignored; keep real credentials only in local.env.
GitHub publish flow
- Check status
git status
- Stage and commit
git add .
git commit -m "Consolidate AF3/Qwen pipelines, endpoint templates, and docs"
- Push to GitHub remote
git push github main