Andrew

Add LoRA upload flow in Space UI and update README

048d6f4 5 days ago

9.84 kB

	---
	title: ACE-Step 1.5 LoRA Studio
	emoji: music
	colorFrom: blue
	colorTo: teal
	sdk: gradio
	app_file: app.py
	pinned: false
	---

	# ACE-Step 1.5 LoRA Studio
	- Andrew Rapier

	Train ACE-Step 1.5 LoRA adapters, deploy your own Hugging Face Space, and run production-style inference through a Dedicated Endpoint.

	[![Create HF Space](https://img.shields.io/badge/Create-HF%20Space-FFD21E?logo=huggingface&logoColor=black)](https://huggingface.co/new-space)
	[![Create HF Endpoint Repo](https://img.shields.io/badge/Create-HF%20Endpoint%20Repo-FFB000?logo=huggingface&logoColor=black)](https://huggingface.co/new-model)
	[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)

	## What you get

	- LoRA training UI and workflow: `app.py`, `lora_ui.py`
	- CLI LoRA trainer for local/HF datasets: `lora_train.py`
	- Qwen2-Audio captioning/annotation pipeline: `qwen_caption_app.py`, `qwen_audio_captioning.py`, `scripts/annotations/`
	- Audio Flamingo 3 + ChatGPT cleanup pipeline: `af3_chatgpt_pipeline.py`, `scripts/pipeline/`, `services/pipeline_api.py`
	- React orchestration UI for AF3+ChatGPT: `react-ui/`
	- Custom endpoint runtime: `handler.py`, `acestep/`
	- Bootstrap automation for cloning into your HF account: `scripts/hf_clone.py`
	- Endpoint test clients and HF job launcher: `scripts/endpoint/`, `scripts/jobs/`

	## Quick start (local)

	```bash
	python -m pip install --upgrade pip
	python -m pip install -r requirements.txt
	python app.py
	```

	Open `http://localhost:7860`.

	## End-to-end setup (recommended)

	Use this sequence when setting up from scratch.

	1. Install dependencies

	```bash
	python -m pip install --upgrade pip
	python -m pip install -r requirements.txt
	```

	2. Create local `.env` from `.env.example` and fill secrets

	```env
	HF_TOKEN=hf_xxx
	HF_AF3_ENDPOINT_URL=https://YOUR_AF3_ENDPOINT.endpoints.huggingface.cloud
	OPENAI_API_KEY=sk-...
	OPENAI_MODEL=gpt-5-mini
	AF3_MODEL_ID=nvidia/audio-flamingo-3-hf
	```

	3. Bootstrap your Hugging Face repos (Space + endpoint templates)

	```bash
	python scripts/hf_clone.py space --repo-id YOUR_USERNAME/YOUR_SPACE_NAME
	python scripts/hf_clone.py af3-nvidia-endpoint --repo-id YOUR_USERNAME/YOUR_AF3_NVIDIA_ENDPOINT_REPO
	```

	4. Deploy endpoint from the cloned AF3 NVIDIA endpoint repo

	- Set endpoint task to `custom`.
	- Confirm top-level `handler.py` exists in the endpoint repo.
	- Set endpoint env vars if needed (`HF_TOKEN`, `AF3_NV_DEFAULT_MODE=think`).

	5. Generate analysis sidecars from audio

	```bash
	python scripts/pipeline/run_af3_chatgpt_pipeline.py \
	--dataset-dir ./train-dataset \
	--backend hf_endpoint \
	--endpoint-url "$HF_AF3_ENDPOINT_URL" \
	--openai-api-key "$OPENAI_API_KEY"
	```

	6. Normalize existing JSONs into LoRA-ready shape (optional but recommended)

	```bash
	python scripts/pipeline/refine_dataset_json_with_openai.py \
	--dataset-dir ./train-dataset \
	--enable-web-search
	```

	This script keeps core fields needed by ACE-Step LoRA training and preserves rich analysis context in `source.rich_details`.

	7. Train LoRA

	```bash
	python app.py
	```

	Then in UI:
	- Load model.
	- Scan/upload dataset.
	- Start LoRA training.

	8. Test generation with your new adapter

	- Use the endpoint scripts in `scripts/endpoint/`.
	- Or test through the Gradio UI flow.
	- In Step 4 - Evaluate, you can now upload your own LoRA adapter (`.zip` or adapter files),
	then load it without retraining in this Space.

	## AF3 GUI one-command startup

	1. Configure `.env` (never commit this file):

	```env
	HF_TOKEN=hf_xxx
	HF_AF3_ENDPOINT_URL=https://bc3r76slij67lskb.us-east-1.aws.endpoints.huggingface.cloud
	OPENAI_API_KEY=sk-...
	OPENAI_MODEL=gpt-5-mini
	AF3_MODEL_ID=nvidia/audio-flamingo-3-hf
	```

	2. Launch API + GUI together:

	```bash
	python af3_gui_app.py
	```

	PowerShell alternative:

	```powershell
	.\scripts\dev\run_af3_gui.ps1
	```

	This command builds the React UI and serves it from the FastAPI backend.
	Open `http://127.0.0.1:8008`.

	## Clone to your HF account

	Use the two buttons near the top of this README to create target repos in your HF account, then run:

	Set token once:

	```bash
	# Linux/macOS
	export HF_TOKEN=hf_xxx

	# Windows PowerShell
	$env:HF_TOKEN="hf_xxx"
	```

	Clone your own Space:

	```bash
	python scripts/hf_clone.py space --repo-id YOUR_USERNAME/YOUR_SPACE_NAME
	```

	Clone your own Endpoint repo:

	```bash
	python scripts/hf_clone.py endpoint --repo-id YOUR_USERNAME/YOUR_ENDPOINT_REPO
	```

	Clone a Qwen2-Audio caption endpoint repo:

	```bash
	python scripts/hf_clone.py qwen-endpoint --repo-id YOUR_USERNAME/YOUR_QWEN_ENDPOINT_REPO
	```

	Clone an Audio Flamingo 3 caption endpoint repo:

	```bash
	python scripts/hf_clone.py af3-endpoint --repo-id YOUR_USERNAME/YOUR_AF3_ENDPOINT_REPO
	```

	When creating that endpoint, set task to `custom` so it loads the custom `handler.py`.

	Clone an AF3 NVIDIA-stack endpoint repo (matches NVIDIA Space stack better):

	```bash
	python scripts/hf_clone.py af3-nvidia-endpoint --repo-id YOUR_USERNAME/YOUR_AF3_NVIDIA_ENDPOINT_REPO
	```

	Use this path when you want think/long quality behavior closer to NVIDIA's public demo.

	Clone both in one run:

	```bash
	python scripts/hf_clone.py all \
	--space-repo-id YOUR_USERNAME/YOUR_SPACE_NAME \
	--endpoint-repo-id YOUR_USERNAME/YOUR_ENDPOINT_REPO
	```

	## Project layout

	```text
	.
	\|- app.py
	\|- lora_ui.py
	\|- lora_train.py
	\|- qwen_caption_app.py
	\|- qwen_audio_captioning.py
	\|- af3_chatgpt_pipeline.py
	\|- af3_gui_app.py
	\|- handler.py
	\|- acestep/
	\|- scripts/
	\| \|- hf_clone.py
	\| \|- dev/
	\| \| \|- run_af3_gui.py
	\| \| `- run_af3_gui.ps1
	\| \|- annotations/
	\| \| `- qwen_caption_dataset.py
	\| \|- pipeline/
	\| \| `- run_af3_chatgpt_pipeline.py
	\| \|- endpoint/
	\| \| \|- generate_interactive.py
	\| \| \|- test.ps1
	\| \| \|- test.bat
	\| \| \|- test_rnb.bat
	\| \| `- test_rnb_2min.bat
	\| `- jobs/
	\| `- submit_hf_lora_job.ps1
	\| `- submit_hf_qwen_caption_job.ps1
	\|- services/
	\| `- pipeline_api.py
	\|- react-ui/
	\|- utils/
	\| `- env_config.py
	\|- docs/
	\| \|- deploy/
	\| `- guides/
	\|- summaries/
	\| `- findings.md
	`- templates/hf-endpoint/
	```

	## Dataset format

	Supported audio:

	- `.wav`, `.flac`, `.mp3`, `.ogg`, `.opus`, `.m4a`, `.aac`

	Optional sidecar metadata per track:

	- `song_001.wav`
	- `song_001.json`

	```json
	{
	"caption": "melodic emotional rnb pop with warm pads",
	"lyrics": "[Verse]\\n...",
	"bpm": 92,
	"keyscale": "Am",
	"timesignature": "4/4",
	"vocal_language": "en",
	"duration": 120
	}
	```

	## Qwen2-Audio annotation pipeline (music captioning)

	Run the dedicated annotation UI:

	```bash
	python qwen_caption_app.py
	```

	Batch caption from CLI:

	```bash
	python scripts/annotations/qwen_caption_dataset.py \
	--dataset-dir ./dataset_inbox \
	--backend local \
	--model-id Qwen/Qwen2-Audio-7B-Instruct \
	--output-dir ./qwen_annotations \
	--copy-audio
	```

	This also writes `.json` sidecars next to source audio by default for direct ACE-Step LoRA training.

	Then train LoRA on the exported dataset:

	```bash
	python lora_train.py --dataset-dir ./qwen_annotations/dataset --model-config acestep-v15-base
	```

	## Audio Flamingo 3 + ChatGPT pipeline (analysis -> normalized sidecar JSON)

	This stack runs:

	1. Audio Flamingo 3 for raw music analysis prose.
	2. ChatGPT for cleanup/normalization into LoRA-ready fields.
	3. Sidecar JSON export next to each audio file (or in a custom output folder).

	CLI single track:

	```bash
	python scripts/pipeline/run_af3_chatgpt_pipeline.py \
	--audio "./train-dataset/Andrew Spacey - Wonder (Prod Beat It AT).mp3" \
	--backend hf_endpoint \
	--endpoint-url "$HF_AF3_ENDPOINT_URL" \
	--hf-token "$HF_TOKEN" \
	--openai-api-key "$OPENAI_API_KEY" \
	--artist-name "Andrew Spacey" \
	--track-name "Wonder"
	```

	CLI dataset batch:

	```bash
	python scripts/pipeline/run_af3_chatgpt_pipeline.py \
	--dataset-dir ./train-dataset \
	--backend hf_endpoint \
	--endpoint-url "$HF_AF3_ENDPOINT_URL" \
	--openai-api-key "$OPENAI_API_KEY"
	```

	Refine already-generated JSON files in place:

	```bash
	python scripts/pipeline/refine_dataset_json_with_openai.py \
	--dataset-dir ./train-dataset \
	--enable-web-search
	```

	Write refined files to a separate folder:

	```bash
	python scripts/pipeline/refine_dataset_json_with_openai.py \
	--dataset-dir ./train-dataset \
	--recursive \
	--enable-web-search \
	--output-dir ./train-dataset-refined
	```

	Single-command GUI (recommended):

	```bash
	python af3_gui_app.py
	```

	Manual API + React UI:

	```bash
	uvicorn services.pipeline_api:app --host 0.0.0.0 --port 8008 --reload
	```

	```bash
	cd react-ui
	npm install
	npm run dev
	```

	Open `http://localhost:5173` (manual) or `http://127.0.0.1:8008` (single-command).

	## Endpoint testing

	```bash
	python scripts/endpoint/generate_interactive.py
	```

	Or run scripted tests:

	- `scripts/endpoint/test.ps1`
	- `scripts/endpoint/test.bat`

	## Findings and notes

	Current baseline analysis and improvement ideas are tracked in:

	- `summaries/findings.md`

	## Docs

	- Space deployment: `docs/deploy/SPACE.md`
	- Qwen caption Space deployment: `docs/deploy/QWEN_SPACE.md`
	- Endpoint deployment: `docs/deploy/ENDPOINT.md`
	- AF3 endpoint deployment: `docs/deploy/AF3_ENDPOINT.md`
	- AF3 NVIDIA-stack endpoint deployment: `docs/deploy/AF3_NVIDIA_ENDPOINT.md`
	- Additional guides: `docs/guides/qwen2-audio-train.md`, `docs/guides/af3-chatgpt-pipeline.md`

	## Open-source readiness checklist

	- Secrets are env-driven (`HF_TOKEN`, `HF_AF3_ENDPOINT_URL`, `OPENAI_API_KEY`, `.env`).
	- Local artifacts are ignored via `.gitignore`.
	- MIT license included.
	- Reproducible clone/deploy paths documented.
	- `.env` is git-ignored; keep real credentials only in local `.env`.

	## GitHub publish flow

	1. Check status

	```bash
	git status
	```

	2. Stage and commit

	```bash
	git add .
	git commit -m "Consolidate AF3/Qwen pipelines, endpoint templates, and docs"
	```

	3. Push to GitHub remote

	```bash
	git push github main
	```