|
|
--- |
|
|
title: ACE-Step 1.5 LoRA Studio |
|
|
emoji: music |
|
|
colorFrom: blue |
|
|
colorTo: teal |
|
|
sdk: gradio |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
--- |
|
|
|
|
|
# ACE-Step 1.5 LoRA Studio |
|
|
- Andrew Rapier |
|
|
|
|
|
Train ACE-Step 1.5 LoRA adapters, deploy your own Hugging Face Space, and run production-style inference through a Dedicated Endpoint. |
|
|
|
|
|
[](https://huggingface.co/new-space) |
|
|
[](https://huggingface.co/new-model) |
|
|
[](LICENSE) |
|
|
|
|
|
## What you get |
|
|
|
|
|
- LoRA training UI and workflow: `app.py`, `lora_ui.py` |
|
|
- CLI LoRA trainer for local/HF datasets: `lora_train.py` |
|
|
- Qwen2-Audio captioning/annotation pipeline: `qwen_caption_app.py`, `qwen_audio_captioning.py`, `scripts/annotations/` |
|
|
- Audio Flamingo 3 + ChatGPT cleanup pipeline: `af3_chatgpt_pipeline.py`, `scripts/pipeline/`, `services/pipeline_api.py` |
|
|
- React orchestration UI for AF3+ChatGPT: `react-ui/` |
|
|
- Custom endpoint runtime: `handler.py`, `acestep/` |
|
|
- Bootstrap automation for cloning into your HF account: `scripts/hf_clone.py` |
|
|
- Endpoint test clients and HF job launcher: `scripts/endpoint/`, `scripts/jobs/` |
|
|
|
|
|
## Quick start (local) |
|
|
|
|
|
```bash |
|
|
python -m pip install --upgrade pip |
|
|
python -m pip install -r requirements.txt |
|
|
python app.py |
|
|
``` |
|
|
|
|
|
Open `http://localhost:7860`. |
|
|
|
|
|
## End-to-end setup (recommended) |
|
|
|
|
|
Use this sequence when setting up from scratch. |
|
|
|
|
|
1. Install dependencies |
|
|
|
|
|
```bash |
|
|
python -m pip install --upgrade pip |
|
|
python -m pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
2. Create local `.env` from `.env.example` and fill secrets |
|
|
|
|
|
```env |
|
|
HF_TOKEN=hf_xxx |
|
|
HF_AF3_ENDPOINT_URL=https://YOUR_AF3_ENDPOINT.endpoints.huggingface.cloud |
|
|
OPENAI_API_KEY=sk-... |
|
|
OPENAI_MODEL=gpt-5-mini |
|
|
AF3_MODEL_ID=nvidia/audio-flamingo-3-hf |
|
|
``` |
|
|
|
|
|
3. Bootstrap your Hugging Face repos (Space + endpoint templates) |
|
|
|
|
|
```bash |
|
|
python scripts/hf_clone.py space --repo-id YOUR_USERNAME/YOUR_SPACE_NAME |
|
|
python scripts/hf_clone.py af3-nvidia-endpoint --repo-id YOUR_USERNAME/YOUR_AF3_NVIDIA_ENDPOINT_REPO |
|
|
``` |
|
|
|
|
|
4. Deploy endpoint from the cloned AF3 NVIDIA endpoint repo |
|
|
|
|
|
- Set endpoint task to `custom`. |
|
|
- Confirm top-level `handler.py` exists in the endpoint repo. |
|
|
- Set endpoint env vars if needed (`HF_TOKEN`, `AF3_NV_DEFAULT_MODE=think`). |
|
|
|
|
|
5. Generate analysis sidecars from audio |
|
|
|
|
|
```bash |
|
|
python scripts/pipeline/run_af3_chatgpt_pipeline.py \ |
|
|
--dataset-dir ./train-dataset \ |
|
|
--backend hf_endpoint \ |
|
|
--endpoint-url "$HF_AF3_ENDPOINT_URL" \ |
|
|
--openai-api-key "$OPENAI_API_KEY" |
|
|
``` |
|
|
|
|
|
6. Normalize existing JSONs into LoRA-ready shape (optional but recommended) |
|
|
|
|
|
```bash |
|
|
python scripts/pipeline/refine_dataset_json_with_openai.py \ |
|
|
--dataset-dir ./train-dataset \ |
|
|
--enable-web-search |
|
|
``` |
|
|
|
|
|
This script keeps core fields needed by ACE-Step LoRA training and preserves rich analysis context in `source.rich_details`. |
|
|
|
|
|
7. Train LoRA |
|
|
|
|
|
```bash |
|
|
python app.py |
|
|
``` |
|
|
|
|
|
Then in UI: |
|
|
- Load model. |
|
|
- Scan/upload dataset. |
|
|
- Start LoRA training. |
|
|
|
|
|
8. Test generation with your new adapter |
|
|
|
|
|
- Use the endpoint scripts in `scripts/endpoint/`. |
|
|
- Or test through the Gradio UI flow. |
|
|
- In **Step 4 - Evaluate**, you can now upload your own LoRA adapter (`.zip` or adapter files), |
|
|
then load it without retraining in this Space. |
|
|
|
|
|
## AF3 GUI one-command startup |
|
|
|
|
|
1. Configure `.env` (never commit this file): |
|
|
|
|
|
```env |
|
|
HF_TOKEN=hf_xxx |
|
|
HF_AF3_ENDPOINT_URL=https://bc3r76slij67lskb.us-east-1.aws.endpoints.huggingface.cloud |
|
|
OPENAI_API_KEY=sk-... |
|
|
OPENAI_MODEL=gpt-5-mini |
|
|
AF3_MODEL_ID=nvidia/audio-flamingo-3-hf |
|
|
``` |
|
|
|
|
|
2. Launch API + GUI together: |
|
|
|
|
|
```bash |
|
|
python af3_gui_app.py |
|
|
``` |
|
|
|
|
|
PowerShell alternative: |
|
|
|
|
|
```powershell |
|
|
.\scripts\dev\run_af3_gui.ps1 |
|
|
``` |
|
|
|
|
|
This command builds the React UI and serves it from the FastAPI backend. |
|
|
Open `http://127.0.0.1:8008`. |
|
|
|
|
|
## Clone to your HF account |
|
|
|
|
|
Use the two buttons near the top of this README to create target repos in your HF account, then run: |
|
|
|
|
|
Set token once: |
|
|
|
|
|
```bash |
|
|
# Linux/macOS |
|
|
export HF_TOKEN=hf_xxx |
|
|
|
|
|
# Windows PowerShell |
|
|
$env:HF_TOKEN="hf_xxx" |
|
|
``` |
|
|
|
|
|
Clone your own Space: |
|
|
|
|
|
```bash |
|
|
python scripts/hf_clone.py space --repo-id YOUR_USERNAME/YOUR_SPACE_NAME |
|
|
``` |
|
|
|
|
|
Clone your own Endpoint repo: |
|
|
|
|
|
```bash |
|
|
python scripts/hf_clone.py endpoint --repo-id YOUR_USERNAME/YOUR_ENDPOINT_REPO |
|
|
``` |
|
|
|
|
|
Clone a Qwen2-Audio caption endpoint repo: |
|
|
|
|
|
```bash |
|
|
python scripts/hf_clone.py qwen-endpoint --repo-id YOUR_USERNAME/YOUR_QWEN_ENDPOINT_REPO |
|
|
``` |
|
|
|
|
|
Clone an Audio Flamingo 3 caption endpoint repo: |
|
|
|
|
|
```bash |
|
|
python scripts/hf_clone.py af3-endpoint --repo-id YOUR_USERNAME/YOUR_AF3_ENDPOINT_REPO |
|
|
``` |
|
|
|
|
|
When creating that endpoint, set task to `custom` so it loads the custom `handler.py`. |
|
|
|
|
|
Clone an AF3 NVIDIA-stack endpoint repo (matches NVIDIA Space stack better): |
|
|
|
|
|
```bash |
|
|
python scripts/hf_clone.py af3-nvidia-endpoint --repo-id YOUR_USERNAME/YOUR_AF3_NVIDIA_ENDPOINT_REPO |
|
|
``` |
|
|
|
|
|
Use this path when you want think/long quality behavior closer to NVIDIA's public demo. |
|
|
|
|
|
Clone both in one run: |
|
|
|
|
|
```bash |
|
|
python scripts/hf_clone.py all \ |
|
|
--space-repo-id YOUR_USERNAME/YOUR_SPACE_NAME \ |
|
|
--endpoint-repo-id YOUR_USERNAME/YOUR_ENDPOINT_REPO |
|
|
``` |
|
|
|
|
|
## Project layout |
|
|
|
|
|
```text |
|
|
. |
|
|
|- app.py |
|
|
|- lora_ui.py |
|
|
|- lora_train.py |
|
|
|- qwen_caption_app.py |
|
|
|- qwen_audio_captioning.py |
|
|
|- af3_chatgpt_pipeline.py |
|
|
|- af3_gui_app.py |
|
|
|- handler.py |
|
|
|- acestep/ |
|
|
|- scripts/ |
|
|
| |- hf_clone.py |
|
|
| |- dev/ |
|
|
| | |- run_af3_gui.py |
|
|
| | `- run_af3_gui.ps1 |
|
|
| |- annotations/ |
|
|
| | `- qwen_caption_dataset.py |
|
|
| |- pipeline/ |
|
|
| | `- run_af3_chatgpt_pipeline.py |
|
|
| |- endpoint/ |
|
|
| | |- generate_interactive.py |
|
|
| | |- test.ps1 |
|
|
| | |- test.bat |
|
|
| | |- test_rnb.bat |
|
|
| | `- test_rnb_2min.bat |
|
|
| `- jobs/ |
|
|
| `- submit_hf_lora_job.ps1 |
|
|
| `- submit_hf_qwen_caption_job.ps1 |
|
|
|- services/ |
|
|
| `- pipeline_api.py |
|
|
|- react-ui/ |
|
|
|- utils/ |
|
|
| `- env_config.py |
|
|
|- docs/ |
|
|
| |- deploy/ |
|
|
| `- guides/ |
|
|
|- summaries/ |
|
|
| `- findings.md |
|
|
`- templates/hf-endpoint/ |
|
|
``` |
|
|
|
|
|
## Dataset format |
|
|
|
|
|
Supported audio: |
|
|
|
|
|
- `.wav`, `.flac`, `.mp3`, `.ogg`, `.opus`, `.m4a`, `.aac` |
|
|
|
|
|
Optional sidecar metadata per track: |
|
|
|
|
|
- `song_001.wav` |
|
|
- `song_001.json` |
|
|
|
|
|
```json |
|
|
{ |
|
|
"caption": "melodic emotional rnb pop with warm pads", |
|
|
"lyrics": "[Verse]\\n...", |
|
|
"bpm": 92, |
|
|
"keyscale": "Am", |
|
|
"timesignature": "4/4", |
|
|
"vocal_language": "en", |
|
|
"duration": 120 |
|
|
} |
|
|
``` |
|
|
|
|
|
## Qwen2-Audio annotation pipeline (music captioning) |
|
|
|
|
|
Run the dedicated annotation UI: |
|
|
|
|
|
```bash |
|
|
python qwen_caption_app.py |
|
|
``` |
|
|
|
|
|
Batch caption from CLI: |
|
|
|
|
|
```bash |
|
|
python scripts/annotations/qwen_caption_dataset.py \ |
|
|
--dataset-dir ./dataset_inbox \ |
|
|
--backend local \ |
|
|
--model-id Qwen/Qwen2-Audio-7B-Instruct \ |
|
|
--output-dir ./qwen_annotations \ |
|
|
--copy-audio |
|
|
``` |
|
|
|
|
|
This also writes `.json` sidecars next to source audio by default for direct ACE-Step LoRA training. |
|
|
|
|
|
Then train LoRA on the exported dataset: |
|
|
|
|
|
```bash |
|
|
python lora_train.py --dataset-dir ./qwen_annotations/dataset --model-config acestep-v15-base |
|
|
``` |
|
|
|
|
|
## Audio Flamingo 3 + ChatGPT pipeline (analysis -> normalized sidecar JSON) |
|
|
|
|
|
This stack runs: |
|
|
|
|
|
1. Audio Flamingo 3 for raw music analysis prose. |
|
|
2. ChatGPT for cleanup/normalization into LoRA-ready fields. |
|
|
3. Sidecar JSON export next to each audio file (or in a custom output folder). |
|
|
|
|
|
CLI single track: |
|
|
|
|
|
```bash |
|
|
python scripts/pipeline/run_af3_chatgpt_pipeline.py \ |
|
|
--audio "./train-dataset/Andrew Spacey - Wonder (Prod Beat It AT).mp3" \ |
|
|
--backend hf_endpoint \ |
|
|
--endpoint-url "$HF_AF3_ENDPOINT_URL" \ |
|
|
--hf-token "$HF_TOKEN" \ |
|
|
--openai-api-key "$OPENAI_API_KEY" \ |
|
|
--artist-name "Andrew Spacey" \ |
|
|
--track-name "Wonder" |
|
|
``` |
|
|
|
|
|
CLI dataset batch: |
|
|
|
|
|
```bash |
|
|
python scripts/pipeline/run_af3_chatgpt_pipeline.py \ |
|
|
--dataset-dir ./train-dataset \ |
|
|
--backend hf_endpoint \ |
|
|
--endpoint-url "$HF_AF3_ENDPOINT_URL" \ |
|
|
--openai-api-key "$OPENAI_API_KEY" |
|
|
``` |
|
|
|
|
|
Refine already-generated JSON files in place: |
|
|
|
|
|
```bash |
|
|
python scripts/pipeline/refine_dataset_json_with_openai.py \ |
|
|
--dataset-dir ./train-dataset \ |
|
|
--enable-web-search |
|
|
``` |
|
|
|
|
|
Write refined files to a separate folder: |
|
|
|
|
|
```bash |
|
|
python scripts/pipeline/refine_dataset_json_with_openai.py \ |
|
|
--dataset-dir ./train-dataset \ |
|
|
--recursive \ |
|
|
--enable-web-search \ |
|
|
--output-dir ./train-dataset-refined |
|
|
``` |
|
|
|
|
|
Single-command GUI (recommended): |
|
|
|
|
|
```bash |
|
|
python af3_gui_app.py |
|
|
``` |
|
|
|
|
|
Manual API + React UI: |
|
|
|
|
|
```bash |
|
|
uvicorn services.pipeline_api:app --host 0.0.0.0 --port 8008 --reload |
|
|
``` |
|
|
|
|
|
```bash |
|
|
cd react-ui |
|
|
npm install |
|
|
npm run dev |
|
|
``` |
|
|
|
|
|
Open `http://localhost:5173` (manual) or `http://127.0.0.1:8008` (single-command). |
|
|
|
|
|
## Endpoint testing |
|
|
|
|
|
```bash |
|
|
python scripts/endpoint/generate_interactive.py |
|
|
``` |
|
|
|
|
|
Or run scripted tests: |
|
|
|
|
|
- `scripts/endpoint/test.ps1` |
|
|
- `scripts/endpoint/test.bat` |
|
|
|
|
|
## Findings and notes |
|
|
|
|
|
Current baseline analysis and improvement ideas are tracked in: |
|
|
|
|
|
- `summaries/findings.md` |
|
|
|
|
|
## Docs |
|
|
|
|
|
- Space deployment: `docs/deploy/SPACE.md` |
|
|
- Qwen caption Space deployment: `docs/deploy/QWEN_SPACE.md` |
|
|
- Endpoint deployment: `docs/deploy/ENDPOINT.md` |
|
|
- AF3 endpoint deployment: `docs/deploy/AF3_ENDPOINT.md` |
|
|
- AF3 NVIDIA-stack endpoint deployment: `docs/deploy/AF3_NVIDIA_ENDPOINT.md` |
|
|
- Additional guides: `docs/guides/qwen2-audio-train.md`, `docs/guides/af3-chatgpt-pipeline.md` |
|
|
|
|
|
## Open-source readiness checklist |
|
|
|
|
|
- Secrets are env-driven (`HF_TOKEN`, `HF_AF3_ENDPOINT_URL`, `OPENAI_API_KEY`, `.env`). |
|
|
- Local artifacts are ignored via `.gitignore`. |
|
|
- MIT license included. |
|
|
- Reproducible clone/deploy paths documented. |
|
|
- `.env` is git-ignored; keep real credentials only in local `.env`. |
|
|
|
|
|
## GitHub publish flow |
|
|
|
|
|
1. Check status |
|
|
|
|
|
```bash |
|
|
git status |
|
|
``` |
|
|
|
|
|
2. Stage and commit |
|
|
|
|
|
```bash |
|
|
git add . |
|
|
git commit -m "Consolidate AF3/Qwen pipelines, endpoint templates, and docs" |
|
|
``` |
|
|
|
|
|
3. Push to GitHub remote |
|
|
|
|
|
```bash |
|
|
git push github main |
|
|
``` |
|
|
|