Andrew commited on Feb 14

Commit

bd37cca

1 Parent(s): a3ab20b

github push

Files changed (27) hide show

.env.example +5 -0
.gitignore +64 -2
CONTRIBUTING.md +25 -0
LICENSE +21 -0
README.md +147 -0
acestep/handler.py +41 -2
acestep/llm_inference.py +1 -1
app.py +7 -0
docs/ACE-Step-1.5-LoRA-HF-Consolidated.md +131 -0
docs/deploy/ENDPOINT.md +80 -0
docs/deploy/SPACE.md +40 -0
docs/guides/README.md +5 -0
docs/guides/qwen2-audio-train.md +0 -0
handler.py +44 -6
lora_train.py +1056 -0
lora_ui.py +973 -0
packages.txt +2 -0
requirements.txt +4 -0
scripts/endpoint/generate_interactive.py +223 -0
scripts/endpoint/test.bat +16 -0
scripts/endpoint/test.ps1 +257 -0
scripts/endpoint/test_rnb.bat +12 -0
scripts/endpoint/test_rnb_2min.bat +12 -0
scripts/hf_clone.py +321 -0
scripts/jobs/submit_hf_lora_job.ps1 +85 -0
summaries/findings.md +68 -0
templates/hf-endpoint/README.md +38 -0

.env.example ADDED Viewed

	@@ -0,0 +1,5 @@

+HF_TOKEN=hf_xxx_your_token_here
+HF_ENDPOINT_URL=https://your-endpoint-url.endpoints.huggingface.cloud
+# Optional defaults used by scripts/hf_clone.py
+HF_USERNAME=your-hf-username

.gitignore CHANGED Viewed

@@ -1,4 +1,66 @@
 .env
-*.bat
-*.ps1
 *.wav

+# Environment
 .env
+.env.*
+!.env.example
+# Python cache/build artifacts
+__pycache__/
+*.py[cod]
+*.pyo
+*.pyd
+.pytest_cache/
+.mypy_cache/
+.ruff_cache/
+.ipynb_checkpoints/
+.coverage
+htmlcov/
+build/
+dist/
+*.egg-info/
+# Virtual environments
+.venv/
+venv/
+env/
+# Tool and local runtime caches
+.cache/
+.huggingface/
+.gradio/
+# Logs/temp
+*.log
+*.tmp
+*.temp
+*.bak
+# Model/data/runtime artifacts
+checkpoints/
+lora_output/
+outputs/
+artifacts/
+models/
+datasets/
+/data/
 *.wav
+*.flac
+*.mp3
+*.ogg
+*.opus
+*.m4a
+*.aac
+*.pt
+*.bin
+*.safetensors
+*.ckpt
+*.onnx
+# OS/editor
+.DS_Store
+Thumbs.db
+.idea/
+.vscode/
+# Optional local working copies
+Lora-ace-step/
+song_summaries_llm*.md

CONTRIBUTING.md ADDED Viewed

	@@ -0,0 +1,25 @@

+# Contributing
+## Development Setup
+```bash
+python -m pip install --upgrade pip
+python -m pip install -r requirements.txt
+python app.py
+```
+## Before Opening A PR
+1. Keep secrets out of git (`HF_TOKEN`, endpoint URLs, `.env`).
+2. Do not commit local artifacts (`checkpoints/`, `lora_output/`, generated audio).
+3. Run quick CLI sanity checks:
+   - `python lora_train.py --help`
+   - `python scripts/hf_clone.py --help`
+   - `python scripts/endpoint/generate_interactive.py --help`
+4. Update docs (`README.md`, `docs/deploy/*`) if behavior or workflows changed.
+## Scope Guidelines
+- UI + training workflow changes belong in `lora_ui.py` / `lora_train.py`.
+- Inference endpoint changes belong in `handler.py`.
+- Shared ACE-Step runtime logic belongs in `acestep/`.

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2026 ACE-Step LoRA Studio contributors
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

	@@ -0,0 +1,147 @@

+---
+title: ACE-Step 1.5 LoRA Studio
+emoji: music
+colorFrom: blue
+colorTo: teal
+sdk: gradio
+app_file: app.py
+pinned: false
+---
+# ACE-Step 1.5 LoRA Studio
+Train ACE-Step 1.5 LoRA adapters, deploy your own Hugging Face Space, and run production-style inference through a Dedicated Endpoint.
+[![Create HF Space](https://img.shields.io/badge/Create-HF%20Space-FFD21E?logo=huggingface&logoColor=black)](https://huggingface.co/new-space)
+[![Create HF Endpoint Repo](https://img.shields.io/badge/Create-HF%20Endpoint%20Repo-FFB000?logo=huggingface&logoColor=black)](https://huggingface.co/new-model)
+[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
+## What you get
+- LoRA training UI and workflow: `app.py`, `lora_ui.py`
+- CLI LoRA trainer for local/HF datasets: `lora_train.py`
+- Custom endpoint runtime: `handler.py`, `acestep/`
+- Bootstrap automation for cloning into your HF account: `scripts/hf_clone.py`
+- Endpoint test clients and HF job launcher: `scripts/endpoint/`, `scripts/jobs/`
+## Quick start (local)
+```bash
+python -m pip install --upgrade pip
+python -m pip install -r requirements.txt
+python app.py
+```
+Open `http://localhost:7860`.
+## Clone to your HF account
+Use the two buttons near the top of this README to create target repos in your HF account, then run:
+Set token once:
+```bash
+# Linux/macOS
+export HF_TOKEN=hf_xxx
+# Windows PowerShell
+$env:HF_TOKEN="hf_xxx"
+```
+Clone your own Space:
+```bash
+python scripts/hf_clone.py space --repo-id YOUR_USERNAME/YOUR_SPACE_NAME
+```
+Clone your own Endpoint repo:
+```bash
+python scripts/hf_clone.py endpoint --repo-id YOUR_USERNAME/YOUR_ENDPOINT_REPO
+```
+Clone both in one run:
+```bash
+python scripts/hf_clone.py all \
+  --space-repo-id YOUR_USERNAME/YOUR_SPACE_NAME \
+  --endpoint-repo-id YOUR_USERNAME/YOUR_ENDPOINT_REPO
+```
+## Project layout
+```text
+.
+|- app.py
+|- lora_ui.py
+|- lora_train.py
+|- handler.py
+|- acestep/
+|- scripts/
+|  |- hf_clone.py
+|  |- endpoint/
+|  |  |- generate_interactive.py
+|  |  |- test.ps1
+|  |  |- test.bat
+|  |  |- test_rnb.bat
+|  |  `- test_rnb_2min.bat
+|  `- jobs/
+|     `- submit_hf_lora_job.ps1
+|- docs/
+|  |- deploy/
+|  `- guides/
+|- summaries/
+|  `- findings.md
+`- templates/hf-endpoint/
+```
+## Dataset format
+Supported audio:
+- `.wav`, `.flac`, `.mp3`, `.ogg`, `.opus`, `.m4a`, `.aac`
+Optional sidecar metadata per track:
+- `song_001.wav`
+- `song_001.json`
+```json
+{
+  "caption": "melodic emotional rnb pop with warm pads",
+  "lyrics": "[Verse]\\n...",
+  "bpm": 92,
+  "keyscale": "Am",
+  "timesignature": "4/4",
+  "vocal_language": "en",
+  "duration": 120
+}
+```
+## Endpoint testing
+```bash
+python scripts/endpoint/generate_interactive.py
+```
+Or run scripted tests:
+- `scripts/endpoint/test.ps1`
+- `scripts/endpoint/test.bat`
+## Findings and notes
+Current baseline analysis and improvement ideas are tracked in `summaries/findings.md`.
+## Docs
+- Space deployment: `docs/deploy/SPACE.md`
+- Endpoint deployment: `docs/deploy/ENDPOINT.md`
+- Additional guides: `docs/guides/qwen2-audio-train.md`
+## Open-source readiness checklist
+- Secrets are env-driven (`HF_TOKEN`, `HF_ENDPOINT_URL`, `.env`).
+- Local artifacts are ignored via `.gitignore`.
+- MIT license included.
+- Reproducible clone/deploy paths documented.

acestep/handler.py CHANGED Viewed

@@ -24,6 +24,7 @@ from typing import Optional, Dict, Any, Tuple, List, Union
 import torch
 import torchaudio
 import soundfile as sf
 import time
 from tqdm import tqdm
 from loguru import logger
@@ -1655,7 +1656,7 @@ class AceStepHandler:
         try:
             # Load audio file
-            audio, sr = torchaudio.load(audio_file)
             logger.debug(f"[process_reference_audio] Reference audio shape: {audio.shape}")
             logger.debug(f"[process_reference_audio] Reference audio sample rate: {sr}")
@@ -1710,7 +1711,7 @@ class AceStepHandler:
         try:
             # Load audio file
-            audio, sr = torchaudio.load(audio_file)
             # Normalize to stereo 48kHz
             audio = self._normalize_audio_to_stereo_48k(audio, sr)
@@ -1720,6 +1721,44 @@ class AceStepHandler:
         except Exception as e:
             logger.exception("[process_src_audio] Error processing source audio")
             return None
     def convert_src_audio_to_codes(self, audio_file) -> str:
         """

 import torch
 import torchaudio
 import soundfile as sf
+import numpy as np
 import time
 from tqdm import tqdm
 from loguru import logger
         try:
             # Load audio file
+            audio, sr = self._load_audio_any_backend(audio_file)
             logger.debug(f"[process_reference_audio] Reference audio shape: {audio.shape}")
             logger.debug(f"[process_reference_audio] Reference audio sample rate: {sr}")
         try:
             # Load audio file
+            audio, sr = self._load_audio_any_backend(audio_file)
             # Normalize to stereo 48kHz
             audio = self._normalize_audio_to_stereo_48k(audio, sr)
         except Exception as e:
             logger.exception("[process_src_audio] Error processing source audio")
             return None
+    def _load_audio_any_backend(self, audio_file):
+        """Load audio with torchaudio first, then soundfile fallback."""
+        def _coerce_audio_tensor(audio_obj):
+            if isinstance(audio_obj, list):
+                audio_obj = np.asarray(audio_obj, dtype=np.float32)
+            if isinstance(audio_obj, np.ndarray):
+                audio_obj = torch.from_numpy(audio_obj)
+            if not torch.is_tensor(audio_obj):
+                raise TypeError(f"Unsupported audio type: {type(audio_obj)}")
+            if not torch.is_floating_point(audio_obj):
+                audio_obj = audio_obj.float()
+            # Normalize to [C, T]
+            if audio_obj.dim() == 1:
+                audio_obj = audio_obj.unsqueeze(0)
+            elif audio_obj.dim() == 2:
+                if audio_obj.shape[0] > audio_obj.shape[1] and audio_obj.shape[1] <= 8:
+                    audio_obj = audio_obj.transpose(0, 1)
+            elif audio_obj.dim() == 3:
+                audio_obj = audio_obj[0]
+            else:
+                raise ValueError(f"Unexpected audio dims: {tuple(audio_obj.shape)}")
+            return audio_obj.contiguous()
+        try:
+            audio, sr = torchaudio.load(audio_file)
+            return _coerce_audio_tensor(audio), sr
+        except Exception as torchaudio_exc:
+            try:
+                audio_np, sr = sf.read(audio_file, dtype="float32", always_2d=True)
+                return _coerce_audio_tensor(audio_np.T), sr
+            except Exception as sf_exc:
+                raise RuntimeError(
+                    f"Audio decode failed for '{audio_file}' with torchaudio ({torchaudio_exc}) "
+                    f"and soundfile ({sf_exc})."
+                ) from sf_exc
     def convert_src_audio_to_codes(self, audio_file) -> str:
         """

acestep/llm_inference.py CHANGED Viewed

@@ -457,7 +457,7 @@ class LLMHandler:
             # If lm_model_path is None, use default
             if lm_model_path is None:
-                lm_model_path = "acestep-5Hz-lm-1.7B"
                 logger.info(f"[initialize] lm_model_path is None, using default: {lm_model_path}")
             full_lm_model_path = os.path.join(checkpoint_dir, lm_model_path)

             # If lm_model_path is None, use default
             if lm_model_path is None:
+                lm_model_path = "acestep-5Hz-lm-4B"
                 logger.info(f"[initialize] lm_model_path is None, using default: {lm_model_path}")
             full_lm_model_path = os.path.join(checkpoint_dir, lm_model_path)

app.py CHANGED Viewed

@@ -1,5 +1,12 @@
 import os
 from lora_ui import build_ui
 app = build_ui()

 import os
+# On Hugging Face Spaces Zero, `spaces` must be imported before CUDA-related modules.
+if os.getenv("SPACE_ID"):
+    try:
+        import spaces  # noqa: F401
+    except Exception:
+        pass
 from lora_ui import build_ui
 app = build_ui()

docs/ACE-Step-1.5-LoRA-HF-Consolidated.md ADDED Viewed

	@@ -0,0 +1,131 @@

+# ACE-Step 1.5 LoRA Pipeline (Simple + HF Spaces)
+Last updated: 2026-02-12
+## 1. What is already implemented in this repo
+- Drag/drop dataset loading and folder scan.
+- Optional per-track sidecar JSON (`song.wav` + `song.json`).
+- New **Auto-Label All** option in `lora_ui.py`:
+  - Uses ACE audio understanding (`audio -> semantic codes -> caption/lyrics/metadata`).
+  - Writes/updates sidecar JSON for each track.
+- LoRA training with ACE flow-matching defaults and adapter checkpoints.
+- Training log now shows device plus elapsed time and ETA.
+## 2. Direct answers to your core questions
+### Is LoRA using HF GPU?
+Yes, if the Space hardware is GPU and model device is `auto`/`cuda`, training runs on that Space GPU.
+### Do we get time estimates?
+Yes. The training status now shows elapsed time and ETA in the log.
+### How are metadata and lyrics paired per song?
+By basename in the same folder:
+- `track01.wav`
+- `track01.json`
+### Do you need all metadata?
+No. In this pipeline, metadata is optional.
+- Required minimum: audio files.
+- Strongly recommended: `caption` and/or `lyrics` for better conditioning quality.
+- Optional but helpful: `bpm`, `keyscale`, `timesignature`, `vocal_language`, `duration`.
+### Where are trained adapters saved?
+- Local run: `lora_output/...` by default.
+- HF Space run: `/data/lora_output/...` by default (as configured in UI code).
+- Final adapter checkpoint is saved under a `final` subfolder.
+### Cloud GPU + local files?
+- Training on Spaces uses cloud GPU and writes artifacts to the Space filesystem.
+- To keep results outside the Space, download them or upload to a Hub model repo.
+### Can HF Endpoint GPU train this?
+Not the right product. Inference Endpoints are for model serving/inference; use Spaces (interactive) or Jobs (batch) for training.
+## 3. Minimal dataset format
+Drop files into one folder:
+```text
+dataset_inbox/
+  song_a.wav
+  song_b.flac
+  song_c.mp3
+```
+Optional sidecar for tighter control:
+```text
+dataset_inbox/
+  song_a.wav
+  song_a.json
+```
+Example `song_a.json`:
+```json
+{
+  "caption": "emotional indie pop with airy female vocal and warm pads",
+  "lyrics": "[Verse]\n...",
+  "bpm": 96,
+  "keyscale": "Am",
+  "timesignature": "4/4",
+  "vocal_language": "en",
+  "duration": 120
+}
+```
+## 4. Super simple training flow (UI)
+1. Start UI:
+   - Local: `python app.py`
+   - Space: app starts automatically from `app.py`.
+2. Step 1 tab: initialize `acestep-v15-base` (best LoRA plasticity).
+3. Step 2 tab: scan folder or drag/drop files.
+4. Optional: initialize auto-label LM and click **Auto-Label All**.
+5. Step 3 tab: keep defaults for first run, click **Start Training**.
+6. Click **Refresh Log** to monitor status/loss/ETA.
+7. Step 4 tab: load adapter from output folder and A/B test against base.
+## 5. HF Spaces setup (step by step)
+1. Create a new Hugging Face **Space** with SDK = `Gradio`.
+2. Push this repo to that Space repo.
+3. Ensure Space metadata/front matter includes:
+   - `sdk: gradio`
+   - `app_file: app.py`
+4. In Space `Settings -> Hardware`, select a GPU tier.
+5. In Space `Settings -> Variables and secrets`, add any needed tokens as secrets (never hardcode).
+6. Open the Space and run the 4-step UI flow.
+## 6. GPU association and cost control
+### Pick hardware for your stage
+- Fast/cheap iteration: start with T4 or A10G.
+- Heavier runs or bigger LM usage: A100/L40S/H100 class.
+### Keep spend under control
+1. Use smaller auto-label LM (`0.6B`) unless you need higher quality labels.
+2. Train with `acestep-v15-base` only for final-quality runs; iterate on turbo variants if needed.
+3. Pause or downgrade hardware immediately when idle.
+4. Export/upload adapters right after training so you can shut hardware down.
+### Current billing behavior to remember
+HF Spaces docs indicate upgraded hardware is billed by minute while the Space is running, and you should pause/stop upgraded hardware when not in use.
+## 7. Suggested first-run defaults
+- Model: `acestep-v15-base`
+- LoRA rank/alpha/dropout: `64 / 64 / 0.1`
+- Optimizer: `adamw_8bit`
+- LR: `1e-4`
+- Warmup: `0.03`
+- Scheduler: `constant_with_warmup`
+- Shift: `3.0`
+- Max grad norm: `1.0`
+## 8. Source links (official)
+- ACE-Step Gradio guide: https://huggingface.co/spaces/ACE-Step/Ace-Step-v1.5/blob/main/docs/en/GRADIO_GUIDE.md
+- ACE-Step README: https://huggingface.co/spaces/ACE-Step/Ace-Step-v1.5/blob/main/README.md
+- ACE-Step LoRA model card note (DiT-only LoRA): https://huggingface.co/ACE-Step/Ace-Step-v1.5-lo-ra-new-year
+- HF Spaces overview: https://huggingface.co/docs/hub/en/spaces-overview
+- HF Spaces GPU/hardware docs: https://huggingface.co/docs/hub/en/spaces-gpus
+- HF Spaces config reference: https://huggingface.co/docs/hub/en/spaces-config-reference
+- HF Inference Endpoints overview: https://huggingface.co/docs/inference-endpoints/en/index

docs/deploy/ENDPOINT.md ADDED Viewed

	@@ -0,0 +1,80 @@

+# Deploy Inference To Your Own HF Dedicated Endpoint
+This guide deploys the custom `handler.py` inference runtime to a Hugging Face Dedicated Inference Endpoint.
+## Prerequisites
+- Hugging Face account
+- `HF_TOKEN` with repo write access
+- Dedicated Endpoint access on your HF plan
+## 1) Create/Update Your Endpoint Repo
+```bash
+python scripts/hf_clone.py endpoint --repo-id YOUR_USERNAME/YOUR_ENDPOINT_REPO
+```
+This uploads:
+- `handler.py`
+- `acestep/`
+- `requirements.txt`
+- `packages.txt`
+- endpoint-specific README template
+## 2) Create Endpoint In HF UI
+1. Go to **Inference Endpoints** -> **New endpoint**.
+2. Select your custom model repo: `YOUR_USERNAME/YOUR_ENDPOINT_REPO`.
+3. Choose GPU hardware.
+4. Deploy.
+## 3) Recommended Endpoint Environment Variables
+- `ACE_CONFIG_PATH` (default: `acestep-v15-sft`)
+- `ACE_LM_MODEL_PATH` (default: `acestep-5Hz-lm-4B`)
+- `ACE_LM_BACKEND` (default: `pt`)
+- `ACE_DOWNLOAD_SOURCE` (`huggingface` or `modelscope`)
+- `ACE_ENABLE_FALLBACK` (`false` recommended for strict failure visibility)
+## 4) Test The Endpoint
+Set credentials:
+```bash
+# Linux/macOS
+export HF_TOKEN=hf_xxx
+export HF_ENDPOINT_URL=https://your-endpoint-url.endpoints.huggingface.cloud
+# Windows PowerShell
+$env:HF_TOKEN="hf_xxx"
+$env:HF_ENDPOINT_URL="https://your-endpoint-url.endpoints.huggingface.cloud"
+```
+Test with:
+- `python scripts/endpoint/generate_interactive.py`
+- `scripts/endpoint/test.ps1`
+## Request Contract
+```json
+{
+  "inputs": {
+    "prompt": "upbeat pop rap with emotional guitar",
+    "lyrics": "[Verse] city lights and midnight rain",
+    "duration_sec": 12,
+    "sample_rate": 44100,
+    "seed": 42,
+    "guidance_scale": 7.0,
+    "steps": 50,
+    "use_lm": true
+  }
+}
+```
+## Cost Control
+- Use scale-to-zero for idle periods.
+- Pause endpoint for immediate spend stop.
+- Expect cold starts when scaled to zero.

docs/deploy/SPACE.md ADDED Viewed

	@@ -0,0 +1,40 @@

+# Deploy LoRA Studio To Your Own HF Space
+This guide deploys the full LoRA Studio UI to your own Hugging Face Space.
+## Prerequisites
+- Hugging Face account
+- `HF_TOKEN` with repo write access
+- Python environment with `requirements.txt` installed
+## Fast Path (Recommended)
+```bash
+python scripts/hf_clone.py space --repo-id YOUR_USERNAME/YOUR_SPACE_NAME
+```
+Optional private Space:
+```bash
+python scripts/hf_clone.py space --repo-id YOUR_USERNAME/YOUR_SPACE_NAME --private
+```
+## Manual Path
+1. Create a new Space on Hugging Face:
+   - SDK: `Gradio`
+2. Push this repo content (excluding local artifacts) to that Space repo.
+3. Ensure README front matter has:
+   - `sdk: gradio`
+   - `app_file: app.py`
+4. In Space settings:
+   - select GPU hardware (A10G/A100/etc.) if needed
+   - add secrets (`HF_TOKEN`) if your flow requires private Hub access
+## Runtime Notes
+- Space output defaults to `/data/lora_output` on Hugging Face Spaces.
+- Enable persistent storage if you need checkpoint retention across restarts.
+- For long-running non-interactive training, HF Jobs may be more cost-efficient than keeping a Space running.

docs/guides/README.md ADDED Viewed

	@@ -0,0 +1,5 @@

+# Guides
+Additional step-by-step guides that are useful but not required for the core LoRA Studio flow.
+- `qwen2-audio-train.md`

docs/guides/qwen2-audio-train.md ADDED Viewed

File without changes

handler.py CHANGED Viewed

@@ -3,6 +3,7 @@ import base64
 import io
 import os
 import traceback
 from typing import Any, Dict, Optional, Tuple
 import numpy as np
@@ -27,7 +28,7 @@ class EndpointHandler:
         "sample_rate": 44100,
         "seed": 42,
         "guidance_scale": 7.0,
-        "steps": 8,
         "use_lm": true,
         "simple_prompt": false,
         "instrumental": false,
@@ -50,8 +51,10 @@ class EndpointHandler:
         self.project_root = os.path.dirname(os.path.abspath(__file__))
         self.model_repo = os.getenv("ACE_MODEL_REPO", "ACE-Step/Ace-Step1.5")
-        self.config_path = os.getenv("ACE_CONFIG_PATH", "acestep-v15-turbo")
-        self.lm_model_path = os.getenv("ACE_LM_MODEL_PATH", "acestep-5Hz-lm-1.7B")
         self.lm_backend = os.getenv("ACE_LM_BACKEND", "pt")
         self.download_source = os.getenv("ACE_DOWNLOAD_SOURCE", "huggingface")
@@ -233,6 +236,31 @@ class EndpointHandler:
         try:
             checkpoint_dir = os.path.join(self.project_root, "checkpoints")
             status, ok = self.llm_handler.initialize(
                 checkpoint_dir=checkpoint_dir,
                 lm_model_path=self.lm_model_path,
@@ -352,8 +380,14 @@ class EndpointHandler:
         seed = self._to_int(raw_inputs.get("seed", 42), 42)
         guidance_scale = self._to_float(raw_inputs.get("guidance_scale", 7.0), 7.0)
-        steps = self._to_int(raw_inputs.get("steps", raw_inputs.get("inference_steps", 8)), 8)
         steps = max(1, min(steps, 200))
         use_lm = self._to_bool(raw_inputs.get("use_lm", raw_inputs.get("thinking", True)), True)
         allow_fallback = self._to_bool(raw_inputs.get("allow_fallback"), self.enable_fallback)
@@ -365,6 +399,7 @@ class EndpointHandler:
             "seed": seed,
             "guidance_scale": guidance_scale,
             "steps": steps,
             "use_lm": use_lm,
             "instrumental": instrumental,
             "simple_prompt": simple_prompt,
@@ -383,7 +418,7 @@ class EndpointHandler:
             "simple_expansion_error": None,
         }
-        bpm = None
         keyscale = ""
         timesignature = ""
         vocal_language = "unknown"
@@ -399,7 +434,9 @@ class EndpointHandler:
                 if getattr(sample, "success", False):
                     caption = getattr(sample, "caption", "") or caption
                     lyrics = getattr(sample, "lyrics", "") or lyrics
-                    bpm = getattr(sample, "bpm", None)
                     keyscale = getattr(sample, "keyscale", "") or ""
                     timesignature = getattr(sample, "timesignature", "") or ""
                     vocal_language = getattr(sample, "language", "") or "unknown"
@@ -526,6 +563,7 @@ class EndpointHandler:
                     "seed": req["seed"],
                     "guidance_scale": req["guidance_scale"],
                     "steps": req["steps"],
                     "use_lm": req["use_lm"],
                     "simple_prompt": req["simple_prompt"],
                     "instrumental": req["instrumental"],

 import io
 import os
 import traceback
+from pathlib import Path
 from typing import Any, Dict, Optional, Tuple
 import numpy as np
         "sample_rate": 44100,
         "seed": 42,
         "guidance_scale": 7.0,
+        "steps": 50,
         "use_lm": true,
         "simple_prompt": false,
         "instrumental": false,
         self.project_root = os.path.dirname(os.path.abspath(__file__))
         self.model_repo = os.getenv("ACE_MODEL_REPO", "ACE-Step/Ace-Step1.5")
+        # Default to the larger quality-oriented setup.
+        # Override via ACE_CONFIG_PATH / ACE_LM_MODEL_PATH when needed.
+        self.config_path = os.getenv("ACE_CONFIG_PATH", "acestep-v15-sft")
+        self.lm_model_path = os.getenv("ACE_LM_MODEL_PATH", "acestep-5Hz-lm-4B")
         self.lm_backend = os.getenv("ACE_LM_BACKEND", "pt")
         self.download_source = os.getenv("ACE_DOWNLOAD_SOURCE", "huggingface")
         try:
             checkpoint_dir = os.path.join(self.project_root, "checkpoints")
+            full_lm_model_path = os.path.join(checkpoint_dir, self.lm_model_path)
+            if not os.path.exists(full_lm_model_path):
+                try:
+                    from acestep.model_downloader import ensure_lm_model, ensure_main_model
+                except Exception as e:
+                    self.llm_error = f"LM download helper import failed: {type(e).__name__}: {e}"
+                    return False
+                # 1.7B ships with main; 0.6B/4B are standalone submodels.
+                if self.lm_model_path == "acestep-5Hz-lm-1.7B":
+                    dl_ok, dl_msg = ensure_main_model(
+                        checkpoints_dir=Path(checkpoint_dir),
+                        prefer_source=self.download_source,
+                    )
+                else:
+                    dl_ok, dl_msg = ensure_lm_model(
+                        model_name=self.lm_model_path,
+                        checkpoints_dir=Path(checkpoint_dir),
+                        prefer_source=self.download_source,
+                    )
+                self.init_details["llm_download"] = dl_msg
+                if not dl_ok:
+                    self.llm_error = f"LM download failed: {dl_msg}"
+                    return False
             status, ok = self.llm_handler.initialize(
                 checkpoint_dir=checkpoint_dir,
                 lm_model_path=self.lm_model_path,
         seed = self._to_int(raw_inputs.get("seed", 42), 42)
         guidance_scale = self._to_float(raw_inputs.get("guidance_scale", 7.0), 7.0)
+        steps = self._to_int(raw_inputs.get("steps", raw_inputs.get("inference_steps", 50)), 50)
         steps = max(1, min(steps, 200))
+        bpm_raw = raw_inputs.get("bpm")
+        bpm = None
+        if bpm_raw is not None and str(bpm_raw).strip() != "":
+            bpm = self._to_int(bpm_raw, 0)
+            if bpm <= 0:
+                bpm = None
         use_lm = self._to_bool(raw_inputs.get("use_lm", raw_inputs.get("thinking", True)), True)
         allow_fallback = self._to_bool(raw_inputs.get("allow_fallback"), self.enable_fallback)
             "seed": seed,
             "guidance_scale": guidance_scale,
             "steps": steps,
+            "bpm": bpm,
             "use_lm": use_lm,
             "instrumental": instrumental,
             "simple_prompt": simple_prompt,
             "simple_expansion_error": None,
         }
+        bpm = req.get("bpm")
         keyscale = ""
         timesignature = ""
         vocal_language = "unknown"
                 if getattr(sample, "success", False):
                     caption = getattr(sample, "caption", "") or caption
                     lyrics = getattr(sample, "lyrics", "") or lyrics
+                    sample_bpm = getattr(sample, "bpm", None)
+                    if bpm is None:
+                        bpm = sample_bpm
                     keyscale = getattr(sample, "keyscale", "") or ""
                     timesignature = getattr(sample, "timesignature", "") or ""
                     vocal_language = getattr(sample, "language", "") or "unknown"
                     "seed": req["seed"],
                     "guidance_scale": req["guidance_scale"],
                     "steps": req["steps"],
+                    "bpm": req.get("bpm"),
                     "use_lm": req["use_lm"],
                     "simple_prompt": req["simple_prompt"],
                     "instrumental": req["instrumental"],

lora_train.py ADDED Viewed

	@@ -0,0 +1,1056 @@

+"""
+ACE-Step 1.5 LoRA Training Engine
+Handles dataset building, VAE encoding, and flow-matching LoRA training
+of the DiT decoder. Designed to work with the existing AceStepHandler.
+"""
+import os
+import sys
+import json
+import math
+import time
+import random
+import hashlib
+import argparse
+import tempfile
+from pathlib import Path
+from dataclasses import dataclass, field, asdict
+from typing import Optional, List, Dict, Any, Tuple
+import torch
+import torch.nn.functional as F
+import torchaudio
+import soundfile as sf
+import numpy as np
+from loguru import logger
+from tqdm import tqdm
+# ---------------------------------------------------------------------------
+# Dataset helpers
+# ---------------------------------------------------------------------------
+AUDIO_EXTENSIONS = {".wav", ".flac", ".mp3", ".ogg", ".opus", ".m4a", ".aac"}
+@dataclass
+class TrackEntry:
+    """One audio file + its metadata."""
+    audio_path: str
+    caption: str = ""
+    lyrics: str = ""
+    bpm: Optional[int] = None
+    keyscale: str = ""
+    timesignature: str = "4/4"
+    vocal_language: str = "en"
+    duration: Optional[float] = None  # seconds (measured at scan time)
+def _load_track_entry(audio_path: Path) -> TrackEntry:
+    """Load one track + optional sidecar metadata."""
+    sidecar = audio_path.with_suffix(".json")
+    meta: Dict[str, Any] = {}
+    if sidecar.exists():
+        try:
+            meta = json.loads(sidecar.read_text(encoding="utf-8"))
+        except Exception as exc:
+            logger.warning(f"Bad sidecar {sidecar}: {exc}")
+    try:
+        info = torchaudio.info(str(audio_path))
+        duration = info.num_frames / info.sample_rate
+    except Exception:
+        duration = meta.get("duration")
+    return TrackEntry(
+        audio_path=str(audio_path),
+        caption=meta.get("caption", ""),
+        lyrics=meta.get("lyrics", ""),
+        bpm=meta.get("bpm"),
+        keyscale=meta.get("keyscale", ""),
+        timesignature=meta.get("timesignature", "4/4"),
+        vocal_language=meta.get("vocal_language", "en"),
+        duration=duration,
+    )
+def scan_dataset_folder(folder: str) -> List[TrackEntry]:
+    """Scan *folder* for audio files and optional JSON sidecars.
+    For every ``track.wav`` found, if ``track.json`` exists next to it the
+    metadata fields are loaded from the sidecar.  Missing sidecars are fine –
+    the entry will have empty metadata that can be filled later.
+    """
+    folder = Path(folder)
+    if not folder.is_dir():
+        raise FileNotFoundError(f"Dataset folder not found: {folder}")
+    entries: List[TrackEntry] = []
+    for audio_path in sorted(folder.rglob("*")):
+        if audio_path.suffix.lower() not in AUDIO_EXTENSIONS:
+            continue
+        entries.append(_load_track_entry(audio_path))
+    logger.info(f"Scanned {len(entries)} audio files in {folder}")
+    return entries
+def scan_uploaded_files(file_paths: List[str]) -> List[TrackEntry]:
+    """Build entries from dropped/uploaded files.
+    Supports uploading audio files together with optional ``.json`` sidecars.
+    Sidecars are matched by basename stem (``song.mp3`` <-> ``song.json``).
+    """
+    meta_by_stem: Dict[str, Dict[str, Any]] = {}
+    for path in file_paths:
+        p = Path(path)
+        if not p.exists() or p.suffix.lower() != ".json":
+            continue
+        try:
+            meta_by_stem[p.stem] = json.loads(p.read_text(encoding="utf-8"))
+        except Exception as exc:
+            logger.warning(f"Bad uploaded sidecar {p}: {exc}")
+    entries: List[TrackEntry] = []
+    for path in file_paths:
+        p = Path(path)
+        if not p.exists() or p.suffix.lower() not in AUDIO_EXTENSIONS:
+            continue
+        uploaded_meta = meta_by_stem.get(p.stem)
+        if uploaded_meta is None:
+            entries.append(_load_track_entry(p))
+            continue
+        try:
+            info = torchaudio.info(str(p))
+            duration = info.num_frames / info.sample_rate
+        except Exception:
+            duration = uploaded_meta.get("duration")
+        bpm_val = uploaded_meta.get("bpm")
+        if isinstance(bpm_val, str) and bpm_val.strip():
+            try:
+                bpm_val = int(float(bpm_val))
+            except Exception:
+                bpm_val = None
+        entries.append(
+            TrackEntry(
+                audio_path=str(p),
+                caption=uploaded_meta.get("caption", "") or "",
+                lyrics=uploaded_meta.get("lyrics", "") or "",
+                bpm=bpm_val if isinstance(bpm_val, int) else None,
+                keyscale=uploaded_meta.get("keyscale", "") or "",
+                timesignature=uploaded_meta.get("timesignature", "4/4") or "4/4",
+                vocal_language=uploaded_meta.get("vocal_language", uploaded_meta.get("language", "en")) or "en",
+                duration=duration,
+            )
+        )
+    logger.info(
+        "Loaded {} uploaded audio files ({} uploaded sidecars detected)".format(
+            len(entries), len(meta_by_stem)
+        )
+    )
+    return entries
+# ---------------------------------------------------------------------------
+# Training hyper-parameters
+# ---------------------------------------------------------------------------
+@dataclass
+class LoRATrainConfig:
+    """All tuneable knobs for a LoRA run."""
+    # LoRA architecture
+    lora_rank: int = 64
+    lora_alpha: int = 64
+    lora_dropout: float = 0.1
+    lora_target_modules: List[str] = field(
+        default_factory=lambda: ["q_proj", "k_proj", "v_proj", "o_proj"]
+    )
+    # Optimiser
+    learning_rate: float = 1e-4
+    weight_decay: float = 0.01
+    optimizer: str = "adamw_8bit"  # "adamw" | "adamw_8bit"
+    max_grad_norm: float = 1.0
+    # Schedule
+    warmup_ratio: float = 0.03
+    scheduler: str = "constant_with_warmup"
+    # Training loop
+    num_epochs: int = 50
+    batch_size: int = 1
+    gradient_accumulation_steps: int = 1
+    save_every_n_epochs: int = 10
+    log_every_n_steps: int = 5
+    # Flow matching
+    shift: float = 3.0  # timestep shift factor
+    # Audio pre-processing
+    max_duration_sec: float = 240.0  # clamp audio to this length
+    sample_rate: int = 48000
+    # Paths
+    output_dir: str = "lora_output"
+    resume_from: Optional[str] = None
+    # Device
+    device: str = "auto"
+    dtype: str = "bf16"  # "bf16" | "fp16" | "fp32"
+    mixed_precision: bool = True
+# ---------------------------------------------------------------------------
+# Core trainer
+# ---------------------------------------------------------------------------
+class LoRATrainer:
+    """Thin training loop that wraps the existing AceStepHandler."""
+    def __init__(self, handler, config: LoRATrainConfig):
+        """
+        Args:
+            handler: Initialised ``AceStepHandler`` (model, vae, text_encoder loaded).
+            config:  Training hyper-parameters.
+        """
+        self.handler = handler
+        self.cfg = config
+        self.device = handler.device
+        self.dtype = handler.dtype
+        # Will be set during prepare()
+        self.peft_model = None
+        self.optimizer = None
+        self.scheduler = None
+        self.global_step = 0
+        self.current_epoch = 0
+        # Loss history for UI
+        self.loss_history: List[Dict[str, Any]] = []
+        self._stop_requested = False
+    # ------------------------------------------------------------------
+    # Setup
+    # ------------------------------------------------------------------
+    @staticmethod
+    def _resolve_lora_target_modules(model, requested_targets: Optional[List[str]]) -> List[str]:
+        """Resolve LoRA target module suffixes against the actual decoder module names."""
+        linear_module_names = [
+            name for name, module in model.named_modules() if isinstance(module, torch.nn.Linear)
+        ]
+        def _exists_as_suffix(target: str) -> bool:
+            return any(name.endswith(target) for name in linear_module_names)
+        requested_targets = requested_targets or []
+        resolved = [target for target in requested_targets if _exists_as_suffix(target)]
+        if resolved:
+            return resolved
+        fallback_groups = [
+            ["q_proj", "k_proj", "v_proj", "o_proj"],
+            ["to_q", "to_k", "to_v", "to_out.0"],
+            ["query", "key", "value", "out_proj"],
+            ["wq", "wk", "wv", "wo"],
+            ["qkv", "proj_out"],
+        ]
+        for group in fallback_groups:
+            group_resolved = [target for target in group if _exists_as_suffix(target)]
+            if len(group_resolved) >= 2:
+                return group_resolved
+        sample = ", ".join(linear_module_names[:30])
+        raise ValueError(
+            "Could not find LoRA target modules in decoder. "
+            f"Requested={requested_targets}. "
+            f"Sample linear modules: {sample}"
+        )
+    def prepare(self):
+        """Attach LoRA adapters to the decoder and build the optimiser."""
+        import copy
+        from peft import LoraConfig, PeftModel, TaskType, get_peft_model
+        # Keep a backup of the plain base decoder so load/unload logic remains valid.
+        if self.handler._base_decoder is None:
+            self.handler._base_decoder = copy.deepcopy(self.handler.model.decoder)
+        else:
+            self.handler.model.decoder = copy.deepcopy(self.handler._base_decoder)
+            self.handler.model.decoder = self.handler.model.decoder.to(self.device).to(self.dtype)
+            self.handler.model.decoder.eval()
+        resume_adapter = None
+        if self.cfg.resume_from:
+            adapter_cfg = os.path.join(self.cfg.resume_from, "adapter_config.json")
+            if os.path.isfile(adapter_cfg):
+                resume_adapter = self.cfg.resume_from
+        if resume_adapter:
+            logger.info(f"Loading existing LoRA adapter for resume: {resume_adapter}")
+            self.peft_model = PeftModel.from_pretrained(
+                self.handler.model.decoder,
+                resume_adapter,
+                is_trainable=True,
+            )
+        else:
+            resolved_targets = self._resolve_lora_target_modules(
+                self.handler.model.decoder,
+                self.cfg.lora_target_modules,
+            )
+            logger.info(f"Using LoRA target modules: {resolved_targets}")
+            peft_cfg = LoraConfig(
+                r=self.cfg.lora_rank,
+                lora_alpha=self.cfg.lora_alpha,
+                lora_dropout=self.cfg.lora_dropout,
+                target_modules=resolved_targets,
+                bias="none",
+                task_type=TaskType.FEATURE_EXTRACTION,
+            )
+            self.peft_model = get_peft_model(self.handler.model.decoder, peft_cfg)
+        self.peft_model.print_trainable_parameters()
+        self.handler.model.decoder = self.peft_model
+        self.handler.model.decoder.to(self.device).to(self.dtype)
+        self.handler.model.decoder.train()
+        self.handler.lora_loaded = True
+        self.handler.use_lora = True
+        # Build optimiser (only LoRA params are trainable)
+        trainable_params = [p for p in self.peft_model.parameters() if p.requires_grad]
+        if self.cfg.optimizer == "adamw_8bit":
+            try:
+                import bitsandbytes as bnb
+                self.optimizer = bnb.optim.AdamW8bit(
+                    trainable_params,
+                    lr=self.cfg.learning_rate,
+                    weight_decay=self.cfg.weight_decay,
+                )
+            except ImportError:
+                logger.warning("bitsandbytes not found – falling back to standard AdamW")
+                self.optimizer = torch.optim.AdamW(
+                    trainable_params,
+                    lr=self.cfg.learning_rate,
+                    weight_decay=self.cfg.weight_decay,
+                )
+        else:
+            self.optimizer = torch.optim.AdamW(
+                trainable_params,
+                lr=self.cfg.learning_rate,
+                weight_decay=self.cfg.weight_decay,
+            )
+        # Resume checkpoint state (after model/adapter restore).
+        if self.cfg.resume_from and os.path.isfile(
+            os.path.join(self.cfg.resume_from, "training_state.pt")
+        ):
+            state = torch.load(
+                os.path.join(self.cfg.resume_from, "training_state.pt"),
+                weights_only=False,
+            )
+            try:
+                self.optimizer.load_state_dict(state["optimizer"])
+            except Exception as exc:
+                logger.warning(f"Could not restore optimizer state, continuing fresh optimizer: {exc}")
+            self.global_step = int(state.get("global_step", 0))
+            # Saved epoch is completed epoch index; continue from next epoch.
+            self.current_epoch = int(state.get("epoch", -1)) + 1
+            loss_path = os.path.join(self.cfg.resume_from, "loss_history.json")
+            if os.path.isfile(loss_path):
+                try:
+                    with open(loss_path, "r", encoding="utf-8") as f:
+                        self.loss_history = json.load(f)
+                except Exception:
+                    pass
+            logger.info(
+                f"Resumed from {self.cfg.resume_from} "
+                f"(epoch {self.current_epoch}, step {self.global_step})"
+            )
+    # ------------------------------------------------------------------
+    # Data loading
+    # ------------------------------------------------------------------
+    @staticmethod
+    def _coerce_audio_tensor(audio: Any) -> torch.Tensor:
+        """Coerce decoded audio into torch.Tensor with shape [C, T]."""
+        if isinstance(audio, list):
+            audio = np.asarray(audio, dtype=np.float32)
+        if isinstance(audio, np.ndarray):
+            audio = torch.from_numpy(audio)
+        if not torch.is_tensor(audio):
+            raise TypeError(f"Unsupported audio type: {type(audio)}")
+        # Ensure floating point for downstream resample/vae encode.
+        if not torch.is_floating_point(audio):
+            audio = audio.float()
+        # Normalize dimensions to [C, T].
+        if audio.dim() == 1:
+            audio = audio.unsqueeze(0)
+        elif audio.dim() == 2:
+            # Accept either [T, C] or [C, T]; transpose only when clearly [T, C].
+            if audio.shape[0] > audio.shape[1] and audio.shape[1] <= 8:
+                audio = audio.transpose(0, 1)
+        elif audio.dim() == 3:
+            # If batched, take first item.
+            audio = audio[0]
+        else:
+            raise ValueError(f"Unexpected audio dims: {tuple(audio.shape)}")
+        return audio.contiguous()
+    def _load_audio(self, path: str) -> torch.Tensor:
+        """Load audio, resample to 48 kHz stereo, clamp to max_duration."""
+        try:
+            wav, sr = torchaudio.load(path)
+        except Exception as torchaudio_exc:
+            # torchaudio on some Space images requires torchcodec for decode.
+            # Fallback to soundfile so training can proceed without torchcodec.
+            try:
+                audio_np, sr = sf.read(path, dtype="float32", always_2d=True)
+                wav = torch.from_numpy(audio_np.T)
+            except Exception as sf_exc:
+                raise RuntimeError(
+                    f"Failed to decode audio '{path}' with torchaudio ({torchaudio_exc}) "
+                    f"and soundfile ({sf_exc})."
+                ) from sf_exc
+        wav = self._coerce_audio_tensor(wav)
+        # Resample if needed
+        if sr != self.cfg.sample_rate:
+            wav = torchaudio.functional.resample(wav, sr, self.cfg.sample_rate)
+        # Convert mono → stereo
+        if wav.shape[0] == 1:
+            wav = wav.repeat(2, 1)
+        elif wav.shape[0] > 2:
+            wav = wav[:2]
+        # Clamp length
+        max_samples = int(self.cfg.max_duration_sec * self.cfg.sample_rate)
+        if wav.shape[1] > max_samples:
+            wav = wav[:, :max_samples]
+        return wav  # [2, T]
+    def _encode_audio(self, wav: torch.Tensor) -> torch.Tensor:
+        """Encode raw waveform → VAE latent on device."""
+        with torch.no_grad():
+            latent = self.handler._encode_audio_to_latents(wav)
+        if latent.dim() == 2:
+            latent = latent.unsqueeze(0)
+        latent = latent.to(self.dtype)
+        return latent
+    def _build_text_embeddings(self, caption: str, lyrics: str):
+        """Compute text & lyric embeddings using the text encoder."""
+        tokenizer = self.handler.text_tokenizer
+        text_encoder = self.handler.text_encoder
+        # Caption embedding
+        text_tokens = tokenizer(
+            caption or "",
+            return_tensors="pt",
+            padding="max_length",
+            truncation=True,
+            max_length=512,
+        ).to(self.device)
+        with torch.no_grad():
+            text_hidden = text_encoder(
+                input_ids=text_tokens["input_ids"]
+            ).last_hidden_state.to(self.dtype)
+            text_mask = text_tokens["attention_mask"].to(self.dtype)
+        # Lyric embedding (token-level via embed_tokens)
+        lyric_tokens = tokenizer(
+            lyrics or "",
+            return_tensors="pt",
+            padding="max_length",
+            truncation=True,
+            max_length=512,
+        ).to(self.device)
+        with torch.no_grad():
+            lyric_hidden = text_encoder.embed_tokens(
+                lyric_tokens["input_ids"]
+            ).to(self.dtype)
+            lyric_mask = lyric_tokens["attention_mask"].to(self.dtype)
+        return text_hidden, text_mask, lyric_hidden, lyric_mask
+    # ------------------------------------------------------------------
+    # Flow matching loss
+    # ------------------------------------------------------------------
+    def _flow_matching_loss(
+        self,
+        x1: torch.Tensor,
+        encoder_hidden_states: torch.Tensor,
+        encoder_attention_mask: torch.Tensor,
+        context_latents: torch.Tensor,
+    ) -> torch.Tensor:
+        """Compute rectified-flow MSE loss for one sample.
+        Notation follows ACE-Step convention:
+          x0 = noise,  x1 = clean latent
+          xt = t * x0 + (1 - t) * x1
+          target velocity  = x0 - x1
+        """
+        bsz = x1.shape[0]
+        # Sample random timestep per element
+        t = torch.rand(bsz, device=self.device, dtype=self.dtype)
+        # Apply timestep shift: t_shifted = shift * t / (1 + (shift - 1) * t)
+        if self.cfg.shift != 1.0:
+            t = self.cfg.shift * t / (1.0 + (self.cfg.shift - 1.0) * t)
+        t = t.clamp(1e-5, 1.0 - 1e-5)
+        # Noise
+        x0 = torch.randn_like(x1)
+        # Interpolate
+        t_expand = t.view(bsz, 1, 1)
+        xt = t_expand * x0 + (1.0 - t_expand) * x1
+        # Target velocity
+        velocity_target = x0 - x1
+        # Attention mask
+        attention_mask = torch.ones(
+            bsz, x1.shape[1], device=self.device, dtype=self.dtype
+        )
+        # Forward through decoder
+        decoder_out = self.handler.model.decoder(
+            hidden_states=xt,
+            timestep=t,
+            timestep_r=t,
+            attention_mask=attention_mask,
+            encoder_hidden_states=encoder_hidden_states,
+            encoder_attention_mask=encoder_attention_mask,
+            context_latents=context_latents,
+            use_cache=False,
+            output_attentions=False,
+        )
+        velocity_pred = decoder_out[0]  # first element is the predicted output
+        loss = F.mse_loss(velocity_pred, velocity_target)
+        return loss
+    @staticmethod
+    def _pad_and_stack(tensors: List[torch.Tensor], pad_value: float = 0.0) -> torch.Tensor:
+        """Pad variable-length tensors on dimension 0 and stack as batch."""
+        normalized = []
+        for t in tensors:
+            if t.dim() >= 2 and t.shape[0] == 1:
+                normalized.append(t.squeeze(0))
+            else:
+                normalized.append(t)
+        max_len = max(t.shape[0] for t in normalized)
+        template = normalized[0]
+        out_shape = (len(normalized), max_len, *template.shape[1:])
+        out = template.new_full(out_shape, pad_value)
+        for i, t in enumerate(normalized):
+            out[i, : t.shape[0]] = t
+        return out
+    # ------------------------------------------------------------------
+    # Main training loop
+    # ------------------------------------------------------------------
+    def request_stop(self):
+        """Ask the training loop to stop after the current step."""
+        self._stop_requested = True
+    def train(
+        self,
+        entries: List[TrackEntry],
+        progress_callback=None,
+    ) -> str:
+        """Run the full LoRA training.
+        Args:
+            entries: List of scanned TrackEntry objects.
+            progress_callback: ``fn(step, total_steps, loss, epoch)`` for UI updates.
+        Returns:
+            Status message.
+        """
+        self._stop_requested = False
+        self.loss_history.clear()
+        os.makedirs(self.cfg.output_dir, exist_ok=True)
+        if not entries:
+            return "No training data provided."
+        num_entries = len(entries)
+        total_steps = (
+            math.ceil(num_entries / self.cfg.batch_size)
+            * self.cfg.num_epochs
+        )
+        # ---- Pre-encode all audio & text (fits in CPU RAM) ----
+        logger.info("Pre-encoding dataset through VAE & text encoder ...")
+        dataset: List[Dict[str, Any]] = []
+        failed_encode: List[str] = []
+        # Freeze VAE and text encoder (they are not trained)
+        self.handler.vae.eval()
+        self.handler.text_encoder.eval()
+        # Reuse silence reference latent (same as handler's internal fallback path).
+        ref_latent = self.handler.silence_latent[:, :750, :].to(self.device).to(self.dtype)
+        ref_order_mask = torch.zeros(1, device=self.device, dtype=torch.long)
+        for idx, entry in enumerate(tqdm(entries, desc="Encoding dataset")):
+            try:
+                wav = self._load_audio(entry.audio_path)
+                latent = self._encode_audio(wav)
+                text_h, text_m, lyric_h, lyric_m = self._build_text_embeddings(
+                    entry.caption, entry.lyrics
+                )
+                # Prepare condition using the model's own prepare_condition
+                with torch.no_grad():
+                    enc_hs, enc_mask, ctx_lat = self.handler.model.prepare_condition(
+                        text_hidden_states=text_h,
+                        text_attention_mask=text_m,
+                        lyric_hidden_states=lyric_h,
+                        lyric_attention_mask=lyric_m,
+                        refer_audio_acoustic_hidden_states_packed=ref_latent,
+                        refer_audio_order_mask=ref_order_mask,
+                        hidden_states=latent,
+                        attention_mask=torch.ones(
+                            1, latent.shape[1],
+                            device=self.device, dtype=self.dtype,
+                        ),
+                        silence_latent=self.handler.silence_latent,
+                        src_latents=latent,
+                        chunk_masks=torch.ones_like(latent),
+                        is_covers=[False],
+                    )
+                dataset.append(
+                    {
+                        "latent": latent.cpu(),
+                        "enc_hs": enc_hs.cpu(),
+                        "enc_mask": enc_mask.cpu(),
+                        "ctx_lat": ctx_lat.cpu(),
+                        "name": Path(entry.audio_path).stem,
+                    }
+                )
+            except Exception as exc:
+                reason = f"{Path(entry.audio_path).name}: {exc}"
+                failed_encode.append(reason)
+                logger.warning(f"Skipping {entry.audio_path}: {exc}")
+        if not dataset:
+            preview = "\n".join(f"- {msg}" for msg in failed_encode[:8]) or "- (no detailed errors captured)"
+            return (
+                "All tracks failed to encode. Check audio files.\n"
+                "First errors:\n"
+                f"{preview}\n"
+                "Tip: try WAV/FLAC files and dataset folder scan instead of temporary uploads."
+            )
+        logger.info(f"Encoded {len(dataset)}/{num_entries} tracks.")
+        # ---- Warmup scheduler ----
+        total_optim_steps = math.ceil(
+            total_steps / self.cfg.gradient_accumulation_steps
+        )
+        warmup_steps = int(total_optim_steps * self.cfg.warmup_ratio)
+        if self.cfg.scheduler in {"constant_with_warmup", "linear", "cosine"}:
+            try:
+                from transformers import get_scheduler
+                self.scheduler = get_scheduler(
+                    name=self.cfg.scheduler,
+                    optimizer=self.optimizer,
+                    num_warmup_steps=warmup_steps,
+                    num_training_steps=total_optim_steps,
+                )
+            except Exception as exc:
+                logger.warning(f"Could not create scheduler '{self.cfg.scheduler}', disabling scheduler: {exc}")
+                self.scheduler = None
+        else:
+            self.scheduler = None
+        # ---- Training loop ----
+        logger.info(
+            f"Starting LoRA training: {self.cfg.num_epochs} epochs, "
+            f"{len(dataset)} samples, {total_optim_steps} optimiser steps"
+        )
+        self.peft_model.train()
+        accum_loss = 0.0
+        step_in_accum = 0
+        for epoch in range(self.current_epoch, self.cfg.num_epochs):
+            if self._stop_requested:
+                break
+            self.current_epoch = epoch
+            indices = list(range(len(dataset)))
+            random.shuffle(indices)
+            epoch_loss = 0.0
+            epoch_steps = 0
+            for i in range(0, len(indices), self.cfg.batch_size):
+                if self._stop_requested:
+                    break
+                batch_indices = indices[i : i + self.cfg.batch_size]
+                batch_items = [dataset[j] for j in batch_indices]
+                # Move batch to device
+                latents = self._pad_and_stack([it["latent"] for it in batch_items]).to(self.device, self.dtype)
+                enc_hs = self._pad_and_stack([it["enc_hs"] for it in batch_items]).to(self.device, self.dtype)
+                enc_mask = self._pad_and_stack([it["enc_mask"] for it in batch_items], pad_value=0.0).to(self.device)
+                if enc_mask.dtype != self.dtype:
+                    enc_mask = enc_mask.to(self.dtype)
+                ctx_lat = self._pad_and_stack([it["ctx_lat"] for it in batch_items]).to(self.device, self.dtype)
+                # Forward + loss
+                loss = self._flow_matching_loss(latents, enc_hs, enc_mask, ctx_lat)
+                loss = loss / self.cfg.gradient_accumulation_steps
+                loss.backward()
+                accum_loss += loss.item()
+                step_in_accum += 1
+                if step_in_accum >= self.cfg.gradient_accumulation_steps:
+                    torch.nn.utils.clip_grad_norm_(
+                        self.peft_model.parameters(), self.cfg.max_grad_norm
+                    )
+                    self.optimizer.step()
+                    if self.scheduler is not None:
+                        self.scheduler.step()
+                    self.optimizer.zero_grad()
+                    self.global_step += 1
+                    avg_loss = accum_loss
+                    accum_loss = 0.0
+                    step_in_accum = 0
+                    self.loss_history.append(
+                        {
+                            "step": self.global_step,
+                            "epoch": epoch,
+                            "loss": avg_loss,
+                            "lr": self.optimizer.param_groups[0]["lr"],
+                        }
+                    )
+                    if self.global_step % self.cfg.log_every_n_steps == 0:
+                        logger.info(
+                            f"Epoch {epoch+1}/{self.cfg.num_epochs}  "
+                            f"Step {self.global_step}/{total_optim_steps}  "
+                            f"Loss {avg_loss:.6f}  "
+                            f"LR {self.optimizer.param_groups[0]['lr']:.2e}"
+                        )
+                    if progress_callback:
+                        progress_callback(
+                            self.global_step, total_optim_steps, avg_loss, epoch
+                        )
+                epoch_loss += loss.item() * self.cfg.gradient_accumulation_steps
+                epoch_steps += 1
+            # Flush remaining micro-batches when len(dataset) is not divisible by grad accumulation.
+            if step_in_accum > 0:
+                torch.nn.utils.clip_grad_norm_(self.peft_model.parameters(), self.cfg.max_grad_norm)
+                self.optimizer.step()
+                if self.scheduler is not None:
+                    self.scheduler.step()
+                self.optimizer.zero_grad()
+                self.global_step += 1
+                avg_loss = accum_loss
+                accum_loss = 0.0
+                step_in_accum = 0
+                self.loss_history.append(
+                    {
+                        "step": self.global_step,
+                        "epoch": epoch,
+                        "loss": avg_loss,
+                        "lr": self.optimizer.param_groups[0]["lr"],
+                    }
+                )
+            # End of epoch – checkpoint?
+            if (
+                (epoch + 1) % self.cfg.save_every_n_epochs == 0
+                or epoch == self.cfg.num_epochs - 1
+                or self._stop_requested
+            ):
+                self._save_checkpoint(epoch)
+            if epoch_steps > 0:
+                avg_epoch_loss = epoch_loss / epoch_steps
+                logger.info(
+                    f"Epoch {epoch+1} complete – avg loss {avg_epoch_loss:.6f}"
+                )
+        # Final save
+        final_dir = self._save_checkpoint(self.current_epoch, final=True)
+        status = (
+            "Training stopped early." if self._stop_requested else "Training complete!"
+        )
+        return f"{status} Adapter saved to {final_dir}"
+    # ------------------------------------------------------------------
+    # Checkpointing
+    # ------------------------------------------------------------------
+    def _save_checkpoint(self, epoch: int, final: bool = False) -> str:
+        tag = "final" if final else f"epoch-{epoch+1}"
+        save_dir = os.path.join(self.cfg.output_dir, tag)
+        os.makedirs(save_dir, exist_ok=True)
+        # Save PEFT adapter
+        self.peft_model.save_pretrained(save_dir)
+        # Save training state
+        torch.save(
+            {
+                "optimizer": self.optimizer.state_dict(),
+                "global_step": self.global_step,
+                "epoch": epoch,
+            },
+            os.path.join(save_dir, "training_state.pt"),
+        )
+        # Save loss curve
+        loss_path = os.path.join(save_dir, "loss_history.json")
+        with open(loss_path, "w") as f:
+            json.dump(self.loss_history, f)
+        # Save config
+        cfg_path = os.path.join(save_dir, "train_config.json")
+        with open(cfg_path, "w") as f:
+            json.dump(asdict(self.cfg), f, indent=2)
+        logger.info(f"Checkpoint saved → {save_dir}")
+        return save_dir
+    # ------------------------------------------------------------------
+    # Adapter listing
+    # ------------------------------------------------------------------
+    @staticmethod
+    def list_adapters(output_dir: str = "lora_output") -> List[str]:
+        """Return adapter directories inside *output_dir* (recursive)."""
+        results = []
+        root = Path(output_dir)
+        if not root.is_dir():
+            return results
+        for cfg in sorted(root.rglob("adapter_config.json")):
+            d = cfg.parent
+            if d.is_dir():
+                results.append(str(d))
+        return results
+def _build_arg_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(description="ACE-Step 1.5 LoRA trainer (CLI)")
+    # Dataset
+    parser.add_argument("--dataset-dir", type=str, default="", help="Local dataset folder path")
+    parser.add_argument("--dataset-repo", type=str, default="", help="HF dataset repo id (optional)")
+    parser.add_argument("--dataset-revision", type=str, default="main", help="HF dataset revision")
+    parser.add_argument("--dataset-subdir", type=str, default="", help="Subdirectory inside downloaded dataset")
+    # Model init
+    parser.add_argument("--model-config", type=str, default="acestep-v15-base", help="DiT config name")
+    parser.add_argument("--device", type=str, default="auto", choices=["auto", "cuda", "mps", "xpu", "cpu"])
+    parser.add_argument("--offload-to-cpu", action="store_true")
+    parser.add_argument("--offload-dit-to-cpu", action="store_true")
+    parser.add_argument("--prefer-source", type=str, default="huggingface", choices=["huggingface", "modelscope"])
+    # Train config
+    parser.add_argument("--output-dir", type=str, default="lora_output")
+    parser.add_argument("--resume-from", type=str, default="")
+    parser.add_argument("--num-epochs", type=int, default=50)
+    parser.add_argument("--batch-size", type=int, default=1)
+    parser.add_argument("--grad-accum", type=int, default=1)
+    parser.add_argument("--save-every", type=int, default=10)
+    parser.add_argument("--log-every", type=int, default=5)
+    parser.add_argument("--max-duration-sec", type=float, default=240.0)
+    parser.add_argument("--lora-rank", type=int, default=64)
+    parser.add_argument("--lora-alpha", type=int, default=64)
+    parser.add_argument("--lora-dropout", type=float, default=0.1)
+    parser.add_argument("--learning-rate", type=float, default=1e-4)
+    parser.add_argument("--weight-decay", type=float, default=0.01)
+    parser.add_argument("--optimizer", type=str, default="adamw_8bit", choices=["adamw", "adamw_8bit"])
+    parser.add_argument("--max-grad-norm", type=float, default=1.0)
+    parser.add_argument("--warmup-ratio", type=float, default=0.03)
+    parser.add_argument("--scheduler", type=str, default="constant_with_warmup", choices=["constant_with_warmup", "linear", "cosine"])
+    parser.add_argument("--shift", type=float, default=3.0)
+    # Optional upload
+    parser.add_argument("--upload-repo", type=str, default="", help="HF model repo to upload final adapter")
+    parser.add_argument("--upload-path", type=str, default="", help="Path inside upload repo (optional)")
+    parser.add_argument("--upload-private", action="store_true")
+    parser.add_argument("--hf-token-env", type=str, default="HF_TOKEN", help="Environment variable name for HF token")
+    return parser
+def _resolve_dataset_dir(args) -> str:
+    if args.dataset_dir:
+        return args.dataset_dir
+    if not args.dataset_repo:
+        raise ValueError("Provide --dataset-dir or --dataset-repo.")
+    from huggingface_hub import snapshot_download
+    token = os.getenv(args.hf_token_env)
+    temp_root = tempfile.mkdtemp(prefix="acestep_lora_dataset_")
+    local_dir = os.path.join(temp_root, "dataset")
+    logger.info(f"Downloading dataset repo {args.dataset_repo}@{args.dataset_revision} to {local_dir}")
+    snapshot_download(
+        repo_id=args.dataset_repo,
+        repo_type="dataset",
+        revision=args.dataset_revision,
+        local_dir=local_dir,
+        local_dir_use_symlinks=False,
+        token=token,
+    )
+    if args.dataset_subdir:
+        sub = os.path.join(local_dir, args.dataset_subdir)
+        if not os.path.isdir(sub):
+            raise FileNotFoundError(f"Dataset subdir not found: {sub}")
+        return sub
+    return local_dir
+def _upload_adapter_if_requested(args, final_dir: str):
+    if not args.upload_repo:
+        return
+    from huggingface_hub import HfApi
+    token = os.getenv(args.hf_token_env)
+    if not token:
+        raise RuntimeError(
+            f"{args.hf_token_env} is not set. Needed for upload to {args.upload_repo}."
+        )
+    api = HfApi(token=token)
+    api.create_repo(
+        repo_id=args.upload_repo,
+        repo_type="model",
+        exist_ok=True,
+        private=bool(args.upload_private),
+    )
+    path_in_repo = args.upload_path.strip().strip("/") if args.upload_path else ""
+    commit_message = f"Upload ACE-Step LoRA adapter from {Path(final_dir).name}"
+    logger.info(f"Uploading adapter from {final_dir} to {args.upload_repo}/{path_in_repo}")
+    api.upload_folder(
+        repo_id=args.upload_repo,
+        repo_type="model",
+        folder_path=final_dir,
+        path_in_repo=path_in_repo,
+        commit_message=commit_message,
+    )
+    logger.info("Upload complete")
+def main():
+    args = _build_arg_parser().parse_args()
+    dataset_dir = _resolve_dataset_dir(args)
+    entries = scan_dataset_folder(dataset_dir)
+    if not entries:
+        raise RuntimeError(f"No audio files found in dataset: {dataset_dir}")
+    from acestep.handler import AceStepHandler
+    project_root = str(Path(__file__).resolve().parent)
+    handler = AceStepHandler()
+    status, ok = handler.initialize_service(
+        project_root=project_root,
+        config_path=args.model_config,
+        device=args.device,
+        use_flash_attention=False,
+        compile_model=False,
+        offload_to_cpu=bool(args.offload_to_cpu),
+        offload_dit_to_cpu=bool(args.offload_dit_to_cpu),
+        prefer_source=args.prefer_source,
+    )
+    print(status)
+    if not ok:
+        raise RuntimeError("Model initialization failed")
+    cfg = LoRATrainConfig(
+        lora_rank=args.lora_rank,
+        lora_alpha=args.lora_alpha,
+        lora_dropout=args.lora_dropout,
+        learning_rate=args.learning_rate,
+        weight_decay=args.weight_decay,
+        optimizer=args.optimizer,
+        max_grad_norm=args.max_grad_norm,
+        warmup_ratio=args.warmup_ratio,
+        scheduler=args.scheduler,
+        num_epochs=args.num_epochs,
+        batch_size=args.batch_size,
+        gradient_accumulation_steps=args.grad_accum,
+        save_every_n_epochs=args.save_every,
+        log_every_n_steps=args.log_every,
+        shift=args.shift,
+        max_duration_sec=args.max_duration_sec,
+        output_dir=args.output_dir,
+        resume_from=(args.resume_from.strip() if args.resume_from else None),
+        device=args.device,
+    )
+    trainer = LoRATrainer(handler, cfg)
+    trainer.prepare()
+    start = time.time()
+    def _progress(step, total, loss, epoch):
+        elapsed = time.time() - start
+        rate = step / elapsed if elapsed > 0 else 0.0
+        remaining = max(0.0, total - step)
+        eta_sec = remaining / rate if rate > 0 else -1.0
+        eta_msg = f"{eta_sec/60:.1f}m" if eta_sec >= 0 else "unknown"
+        logger.info(
+            f"[progress] step={step}/{total} epoch={epoch+1} loss={loss:.6f} elapsed={elapsed/60:.1f}m eta={eta_msg}"
+        )
+    msg = trainer.train(entries, progress_callback=_progress)
+    print(msg)
+    final_dir = os.path.join(cfg.output_dir, "final")
+    if os.path.isdir(final_dir):
+        _upload_adapter_if_requested(args, final_dir)
+        print(f"Final adapter directory: {final_dir}")
+    else:
+        print(f"Warning: final adapter directory not found at {final_dir}")
+if __name__ == "__main__":
+    main()

lora_ui.py ADDED Viewed

	@@ -0,0 +1,973 @@

+"""
+ACE-Step 1.5 LoRA Training and Evaluation UI.
+Gradio interface with four tabs:
+  1. Model Setup: initialize base DiT, VAE, and text encoder
+  2. Dataset: scan folder or drop files, then edit/save sidecars
+  3. Training: configure hyperparameters and run LoRA training
+  4. Evaluation: load adapters and run deterministic A/B generation
+"""
+import os
+import sys
+import json
+import math
+import random
+import threading
+import tempfile
+import time
+from pathlib import Path
+from typing import List, Optional
+import gradio as gr
+# On Hugging Face Spaces Zero, `spaces` must be imported before CUDA-related modules.
+if os.getenv("SPACE_ID"):
+    try:
+        import spaces  # noqa: F401
+    except Exception:
+        pass
+import torch
+from loguru import logger
+# ---------------------------------------------------------------------------
+# Ensure project root is on sys.path so `acestep` imports work
+# ---------------------------------------------------------------------------
+PROJECT_ROOT = str(Path(__file__).resolve().parent)
+if PROJECT_ROOT not in sys.path:
+    sys.path.insert(0, PROJECT_ROOT)
+from acestep.handler import AceStepHandler
+from acestep.audio_utils import AudioSaver
+from acestep.llm_inference import LLMHandler
+from acestep.inference import understand_music
+from lora_train import (
+    LoRATrainConfig,
+    LoRATrainer,
+    TrackEntry,
+    scan_dataset_folder,
+    scan_uploaded_files,
+)
+# ---------------------------------------------------------------------------
+# Globals (shared across Gradio callbacks)
+# ---------------------------------------------------------------------------
+handler = AceStepHandler()
+llm_handler = LLMHandler()
+trainer: Optional[LoRATrainer] = None
+dataset_entries: List[TrackEntry] = []
+_training_thread: Optional[threading.Thread] = None
+_training_log: List[str] = []
+_training_status: str = "idle"  # idle | running | stopped | done
+_training_started_at: Optional[float] = None
+_model_init_ok: bool = False
+_model_init_status: str = ""
+_last_model_init_args: Optional[dict] = None
+_lm_init_ok: bool = False
+_last_lm_init_args: Optional[dict] = None
+_auto_label_cursor: int = 0
+audio_saver = AudioSaver(default_format="wav")
+IS_SPACE = bool(os.getenv("SPACE_ID"))
+DEFAULT_OUTPUT_DIR = "/data/lora_output" if IS_SPACE else "lora_output"
+if IS_SPACE:
+    try:
+        import spaces as _hf_spaces
+        _gpu_callback = _hf_spaces.GPU(duration=300)
+    except Exception:
+        _gpu_callback = lambda fn: fn
+else:
+    _gpu_callback = lambda fn: fn
+def _rows_from_entries(entries: List[TrackEntry]):
+    rows = []
+    for e in entries:
+        rows.append([
+            Path(e.audio_path).name,
+            f"{e.duration:.1f}s" if e.duration else "?",
+            e.caption or "(none)",
+            e.lyrics[:60] + "..." if len(e.lyrics) > 60 else (e.lyrics or "(none)"),
+            e.vocal_language,
+        ])
+    return rows
+# ===========================================================================
+# Tab 1 - Model Setup
+# ===========================================================================
+def get_available_models():
+    models = handler.get_available_acestep_v15_models()
+    return models if models else ["acestep-v15-base"]
+def init_model(
+    model_name: str,
+    device: str,
+    offload_cpu: bool,
+    offload_dit_cpu: bool,
+):
+    global _model_init_ok, _model_init_status, _last_model_init_args
+    _last_model_init_args = dict(
+        project_root=PROJECT_ROOT,
+        config_path=model_name,
+        device=device,
+        use_flash_attention=False,
+        compile_model=False,
+        offload_to_cpu=offload_cpu,
+        offload_dit_to_cpu=offload_dit_cpu,
+    )
+    status, ok = _init_model_gpu(**_last_model_init_args)
+    _model_init_ok = bool(ok)
+    _model_init_status = status or ""
+    return status
+@_gpu_callback
+def _init_model_gpu(**kwargs):
+    return _init_model_impl(**kwargs)
+def _init_model_impl(**kwargs):
+    return handler.initialize_service(**kwargs)
+# ===========================================================================
+# Tab 2 - Dataset
+# ===========================================================================
+def scan_folder(folder_path: str):
+    global dataset_entries, _auto_label_cursor
+    if not folder_path or not os.path.isdir(folder_path):
+        return "Provide a valid folder path.", []
+    dataset_entries = scan_dataset_folder(folder_path)
+    _auto_label_cursor = 0
+    rows = _rows_from_entries(dataset_entries)
+    msg = f"Found {len(dataset_entries)} audio files."
+    return msg, rows
+def load_uploaded(file_paths: List[str]):
+    global dataset_entries, _auto_label_cursor
+    if not file_paths:
+        return "Drop audio files (and optional .json sidecars) first.", []
+    sidecar_count = sum(
+        1 for p in file_paths if isinstance(p, str) and Path(p).suffix.lower() == ".json"
+    )
+    dataset_entries = scan_uploaded_files(file_paths)
+    _auto_label_cursor = 0
+    rows = _rows_from_entries(dataset_entries)
+    msg = (
+        f"Loaded {len(dataset_entries)} dropped audio files."
+        + (f" Matched {sidecar_count} uploaded sidecar JSON file(s)." if sidecar_count else "")
+    )
+    return msg, rows
+def save_sidecar(index: int, caption: str, lyrics: str, bpm: str, keyscale: str, lang: str):
+    """Save metadata edits back to a JSON sidecar and update in-memory entry."""
+    global dataset_entries
+    if index < 0 or index >= len(dataset_entries):
+        return "Invalid track index."
+    entry = dataset_entries[index]
+    entry.caption = caption
+    entry.lyrics = lyrics
+    if bpm.strip():
+        try:
+            entry.bpm = int(float(bpm))
+        except ValueError:
+            return "Invalid BPM value. Use an integer or leave empty."
+    else:
+        entry.bpm = None
+    entry.keyscale = keyscale
+    entry.vocal_language = lang
+    sidecar_path = Path(entry.audio_path).with_suffix(".json")
+    meta = {
+        "caption": entry.caption,
+        "lyrics": entry.lyrics,
+        "bpm": entry.bpm,
+        "keyscale": entry.keyscale,
+        "timesignature": entry.timesignature,
+        "vocal_language": entry.vocal_language,
+        "duration": entry.duration,
+    }
+    sidecar_path.write_text(json.dumps(meta, indent=2, ensure_ascii=False), encoding="utf-8")
+    return f"Saved sidecar for {Path(entry.audio_path).name}"
+def init_auto_label_lm(lm_model_path: str, lm_backend: str, lm_device: str):
+    global _lm_init_ok, _last_lm_init_args
+    _last_lm_init_args = dict(
+        lm_model_path=lm_model_path,
+        lm_backend=lm_backend,
+        lm_device=lm_device,
+    )
+    status = _init_auto_label_lm_gpu(**_last_lm_init_args)
+    _lm_init_ok = not str(status).startswith("LM init failed:") and not str(status).startswith("LM init exception:")
+    return status
+@_gpu_callback
+def _init_auto_label_lm_gpu(lm_model_path: str, lm_backend: str, lm_device: str):
+    return _init_auto_label_lm_impl(lm_model_path, lm_backend, lm_device)
+def _init_auto_label_lm_impl(lm_model_path: str, lm_backend: str, lm_device: str):
+    """Initialize LLM for dataset auto-labeling."""
+    checkpoint_dir = os.path.join(PROJECT_ROOT, "checkpoints")
+    full_lm_path = os.path.join(checkpoint_dir, lm_model_path)
+    try:
+        if not os.path.exists(full_lm_path):
+            from pathlib import Path as _Path
+            from acestep.model_downloader import ensure_main_model, ensure_lm_model
+            if lm_model_path == "acestep-5Hz-lm-1.7B":
+                ok, msg = ensure_main_model(
+                    checkpoints_dir=_Path(checkpoint_dir),
+                    prefer_source="huggingface",
+                )
+            else:
+                ok, msg = ensure_lm_model(
+                    model_name=lm_model_path,
+                    checkpoints_dir=_Path(checkpoint_dir),
+                    prefer_source="huggingface",
+                )
+            if not ok:
+                return f"Failed to download LM model: {msg}"
+        status, ok = llm_handler.initialize(
+            checkpoint_dir=checkpoint_dir,
+            lm_model_path=lm_model_path,
+            backend=lm_backend,
+            device=lm_device,
+            offload_to_cpu=False,
+        )
+        return status if ok else f"LM init failed:\n{status}"
+    except Exception as exc:
+        logger.exception("LM init failed for auto-label")
+        return f"LM init exception: {exc}"
+def _write_entry_sidecar(entry: TrackEntry):
+    sidecar_path = Path(entry.audio_path).with_suffix(".json")
+    meta = {
+        "caption": entry.caption,
+        "lyrics": entry.lyrics,
+        "bpm": entry.bpm,
+        "keyscale": entry.keyscale,
+        "timesignature": entry.timesignature,
+        "vocal_language": entry.vocal_language,
+        "duration": entry.duration,
+    }
+    sidecar_path.write_text(json.dumps(meta, indent=2, ensure_ascii=False), encoding="utf-8")
+@_gpu_callback
+def auto_label_all(overwrite_existing: bool, caption_only: bool, max_files_per_run: int = 6, reset_cursor: bool = False):
+    """Auto-label all loaded tracks using ACE audio understanding (audio->codes->metadata)."""
+    global dataset_entries, _auto_label_cursor
+    if handler.model is None:
+        if _model_init_ok and _last_model_init_args:
+            status, ok = _init_model_impl(**_last_model_init_args)
+            if not ok:
+                return f"Model reload failed before auto-label:\n{status}", [], "Auto-label skipped."
+        else:
+            return "Initialize model first in Step 1.", [], "Auto-label skipped."
+    if not dataset_entries:
+        return "Load dataset first in Step 2.", [], "Auto-label skipped."
+    if not llm_handler.llm_initialized:
+        if _lm_init_ok and _last_lm_init_args:
+            status = _init_auto_label_lm_impl(**_last_lm_init_args)
+            if not llm_handler.llm_initialized:
+                return (
+                    f"Auto-label LM reload failed:\n{status}",
+                    _rows_from_entries(dataset_entries),
+                    "Auto-label skipped.",
+                )
+        else:
+            return "Initialize Auto-Label LM first.", _rows_from_entries(dataset_entries), "Auto-label skipped."
+    if max_files_per_run <= 0:
+        max_files_per_run = 6
+    if reset_cursor:
+        _auto_label_cursor = 0
+    if _auto_label_cursor < 0 or _auto_label_cursor >= len(dataset_entries):
+        _auto_label_cursor = 0
+    start_idx = _auto_label_cursor
+    end_idx = min(len(dataset_entries), start_idx + int(max_files_per_run))
+    updated = 0
+    skipped = 0
+    failed = 0
+    logs: List[str] = []
+    for idx in range(start_idx, end_idx):
+        entry = dataset_entries[idx]
+        try:
+            missing_fields = []
+            if not (entry.caption or "").strip():
+                missing_fields.append("caption")
+            if (not caption_only) and (not (entry.lyrics or "").strip()):
+                missing_fields.append("lyrics")
+            if entry.bpm is None:
+                missing_fields.append("bpm")
+            if not (entry.keyscale or "").strip():
+                missing_fields.append("keyscale")
+            if entry.duration is None:
+                missing_fields.append("duration")
+            # Skip only when every core field is already available.
+            if (not overwrite_existing) and (len(missing_fields) == 0):
+                skipped += 1
+                logs.append(f"[{idx}] Skipped (already fully labeled): {Path(entry.audio_path).name}")
+                continue
+            codes = handler.convert_src_audio_to_codes(entry.audio_path)
+            if not codes or codes.startswith("❌"):
+                failed += 1
+                logs.append(f"[{idx}] Failed to convert audio to codes: {Path(entry.audio_path).name}")
+                continue
+            result = understand_music(
+                llm_handler=llm_handler,
+                audio_codes=codes,
+                temperature=0.85,
+                use_constrained_decoding=True,
+                constrained_decoding_debug=False,
+            )
+            if not result.success:
+                failed += 1
+                logs.append(f"[{idx}] Failed to label: {Path(entry.audio_path).name} ({result.error or result.status_message})")
+                continue
+            # Update fields. If overwrite is false, fill only missing values.
+            if overwrite_existing or not (entry.caption or "").strip():
+                entry.caption = (result.caption or entry.caption or "").strip()
+            if not caption_only:
+                if overwrite_existing or not (entry.lyrics or "").strip():
+                    entry.lyrics = (result.lyrics or entry.lyrics or "").strip()
+            if entry.bpm is None and result.bpm is not None:
+                entry.bpm = int(result.bpm)
+            if (not entry.keyscale) and result.keyscale:
+                entry.keyscale = result.keyscale
+            if (not entry.timesignature) and result.timesignature:
+                entry.timesignature = result.timesignature
+            if (not entry.vocal_language) and result.language:
+                entry.vocal_language = result.language
+            if entry.duration is None and result.duration is not None:
+                entry.duration = float(result.duration)
+            _write_entry_sidecar(entry)
+            updated += 1
+            logs.append(f"[{idx}] Labeled: {Path(entry.audio_path).name}")
+        except Exception as exc:
+            failed += 1
+            logs.append(f"[{idx}] Exception: {Path(entry.audio_path).name} ({exc})")
+    _auto_label_cursor = 0 if end_idx >= len(dataset_entries) else end_idx
+    mode = "caption-only" if caption_only else "caption+lyrics"
+    progress_msg = (
+        f"Processed batch {start_idx + 1}-{end_idx} of {len(dataset_entries)}. "
+        if len(dataset_entries) > 0 else ""
+    )
+    if _auto_label_cursor == 0 and len(dataset_entries) > 0:
+        progress_msg += "Reached end of dataset."
+    else:
+        progress_msg += f"Next start index: {_auto_label_cursor}."
+    summary = (
+        f"Auto-label ({mode}) complete. Updated={updated}, Skipped={skipped}, Failed={failed}. "
+        f"{progress_msg}"
+    )
+    detail = "\n".join(logs[-40:]) if logs else "No logs."
+    return summary, _rows_from_entries(dataset_entries), detail
+# ===========================================================================
+# Tab 3 - Training
+# ===========================================================================
+def _run_training(config_dict: dict):
+    """Target for the background training thread."""
+    global trainer, _training_status, _training_log, _training_started_at
+    _training_status = "running"
+    _training_log.clear()
+    _training_started_at = time.time()
+    try:
+        cfg = LoRATrainConfig(**config_dict)
+        trainer = LoRATrainer(handler, cfg)
+        trainer.prepare()
+        _training_log.append(f"Training device: {handler.device}")
+        def _cb(step, total, loss, epoch):
+            elapsed = 0.0 if _training_started_at is None else max(0.0, time.time() - _training_started_at)
+            rate = (step / elapsed) if elapsed > 0 else 0.0
+            remaining = max(0, total - step)
+            eta_sec = (remaining / rate) if rate > 0 else -1.0
+            eta_msg = f"{eta_sec/60:.1f}m" if eta_sec >= 0 else "unknown"
+            msg = (
+                f"Step {step}/{total}  Epoch {epoch+1}  Loss {loss:.6f}  "
+                f"Elapsed {elapsed/60:.1f}m  ETA {eta_msg}"
+            )
+            _training_log.append(msg)
+        result = trainer.train(dataset_entries, progress_callback=_cb)
+        _training_log.append(result)
+        _training_status = "done"
+    except Exception as exc:
+        _training_log.append(f"ERROR: {exc}")
+        _training_status = "stopped"
+        logger.exception("Training failed")
+def start_training(
+    lora_rank, lora_alpha, lora_dropout,
+    lr, weight_decay, optimizer_name,
+    max_grad_norm, warmup_ratio, scheduler_name,
+    num_epochs, batch_size, grad_accum,
+    save_every, log_every, shift,
+    max_duration, output_dir, resume_from,
+):
+    global _training_thread, _training_status
+    if handler.model is None:
+        return "Model not initialised. Go to Model Setup first."
+    if not dataset_entries:
+        return "No dataset loaded. Go to Dataset tab first."
+    if _training_status == "running":
+        return "Training already in progress."
+    config_dict = dict(
+        lora_rank=int(lora_rank),
+        lora_alpha=int(lora_alpha),
+        lora_dropout=float(lora_dropout),
+        learning_rate=float(lr),
+        weight_decay=float(weight_decay),
+        optimizer=optimizer_name,
+        max_grad_norm=float(max_grad_norm),
+        warmup_ratio=float(warmup_ratio),
+        scheduler=scheduler_name,
+        num_epochs=int(num_epochs),
+        batch_size=int(batch_size),
+        gradient_accumulation_steps=int(grad_accum),
+        save_every_n_epochs=int(save_every),
+        log_every_n_steps=int(log_every),
+        shift=float(shift),
+        max_duration_sec=float(max_duration),
+        output_dir=output_dir,
+        resume_from=(resume_from.strip() if isinstance(resume_from, str) and resume_from.strip() else None),
+        device=str(handler.device),
+    )
+    steps_per_epoch = math.ceil(len(dataset_entries) / int(batch_size))
+    total_steps = steps_per_epoch * int(num_epochs)
+    total_optim_steps = math.ceil(total_steps / int(grad_accum))
+    _training_thread = threading.Thread(target=_run_training, args=(config_dict,), daemon=True)
+    _training_thread.start()
+    return (
+        f"Training started on {handler.device}. "
+        f"Estimated optimiser steps: {total_optim_steps}."
+    )
+def stop_training():
+    global trainer, _training_status
+    if trainer:
+        trainer.request_stop()
+        _training_status = "stopped"
+        return "Stop requested - will finish current step."
+    return "No training in progress."
+def poll_training():
+    """Return current log + loss chart data."""
+    log_text = "\n".join(_training_log[-50:]) if _training_log else "(no output yet)"
+    # Build loss curve data
+    chart_data = []
+    if trainer and trainer.loss_history:
+        chart_data = [[h["step"], h["loss"]] for h in trainer.loss_history]
+    status = _training_status
+    device_line = f"Device: {handler.device}"
+    if torch.cuda.is_available() and str(handler.device).startswith("cuda"):
+        try:
+            idx = torch.cuda.current_device()
+            name = torch.cuda.get_device_name(idx)
+            allocated = torch.cuda.memory_allocated(idx) / (1024 ** 3)
+            reserved = torch.cuda.memory_reserved(idx) / (1024 ** 3)
+            device_line = (
+                f"Device: {handler.device} ({name}) | "
+                f"VRAM allocated={allocated:.2f}GB reserved={reserved:.2f}GB"
+            )
+        except Exception:
+            pass
+    return f"Status: {status}\n{device_line}\n\n{log_text}", chart_data
+# ===========================================================================
+# Tab 4 - Evaluation / A-B Test
+# ===========================================================================
+def list_adapters(output_dir: str):
+    adapters = LoRATrainer.list_adapters(output_dir)
+    return adapters if adapters else ["(none found)"]
+@_gpu_callback
+def load_adapter(adapter_path: str):
+    if not adapter_path or adapter_path == "(none found)":
+        return "Select a valid adapter path."
+    return handler.load_lora(adapter_path)
+@_gpu_callback
+def unload_adapter():
+    return handler.unload_lora()
+def set_lora_scale(scale: float):
+    return handler.set_lora_scale(scale)
+@_gpu_callback
+def generate_sample(
+    prompt: str,
+    lyrics: str,
+    duration: float,
+    bpm: int,
+    steps: int,
+    guidance: float,
+    seed: int,
+    use_lora: bool,
+    lora_scale: float,
+):
+    """Generate a single audio sample for evaluation."""
+    if handler.model is None:
+        return None, "Model not initialised."
+    # Toggle LoRA if loaded
+    if handler.lora_loaded:
+        handler.set_use_lora(use_lora)
+        if use_lora:
+            handler.set_lora_scale(lora_scale)
+    actual_seed = int(seed) if seed >= 0 else random.randint(0, 2**32 - 1)
+    result = handler.generate_music(
+        captions=prompt,
+        lyrics=lyrics,
+        bpm=bpm if bpm > 0 else None,
+        inference_steps=steps,
+        guidance_scale=guidance,
+        use_random_seed=False,
+        seed=actual_seed,
+        audio_duration=duration,
+        batch_size=1,
+    )
+    if not result.get("success", False):
+        return None, result.get("error", "Generation failed.")
+    audios = result.get("audios", [])
+    if not audios:
+        return None, "No audio produced."
+    # Save to temp file
+    audio_data = audios[0]
+    wav_tensor = audio_data.get("tensor")
+    sr = audio_data.get("sample_rate", 48000)
+    if wav_tensor is None:
+        path = audio_data.get("path")
+        if path and os.path.exists(path):
+            return path, f"Generated (from file), seed={actual_seed}."
+        return None, "No audio tensor."
+    tmp = tempfile.NamedTemporaryFile(suffix=".wav", delete=False)
+    audio_saver.save_audio(wav_tensor, tmp.name, sample_rate=sr)
+    return tmp.name, f"Generated successfully, seed={actual_seed}."
+@_gpu_callback
+def ab_test(
+    prompt, lyrics, duration, bpm, steps, guidance, seed,
+    lora_scale_b,
+):
+    """Generate two samples: A = base, B = LoRA at given scale."""
+    resolved_seed = int(seed) if seed >= 0 else random.randint(0, 2**32 - 1)
+    results = {}
+    for label, use, scale in [("A (base)", False, 0.0), ("B (LoRA)", True, lora_scale_b)]:
+        path, msg = generate_sample(
+            prompt, lyrics, duration, bpm, steps, guidance, resolved_seed,
+            use_lora=use, lora_scale=scale,
+        )
+        results[label] = (path, msg)
+    return (
+        results["A (base)"][0],
+        results["A (base)"][1],
+        results["B (LoRA)"][0],
+        results["B (LoRA)"][1],
+    )
+# ===========================================================================
+# Build the Gradio App
+# ===========================================================================
+def get_workflow_status():
+    model_is_ready = (handler.model is not None) or _model_init_ok
+    model_ready = "Ready" if model_is_ready else "Not initialized"
+    tracks = len(dataset_entries)
+    training_state = _training_status
+    lora_status = handler.get_lora_status() if handler.model is not None else {"loaded": False, "active": False, "scale": 1.0}
+    init_note = ""
+    if IS_SPACE and _model_init_ok and handler.model is None:
+        init_note = " (Zero GPU callback context)"
+    return (
+        f"Model: {model_ready}{init_note}\n"
+        f"Tracks Loaded: {tracks}\n"
+        f"Training: {training_state}\n"
+        f"LoRA Loaded: {lora_status.get('loaded', False)}\n"
+        f"LoRA Active: {lora_status.get('active', False)}\n"
+        f"LoRA Scale: {lora_status.get('scale', 1.0)}"
+    )
+def init_model_and_status(
+    model_name: str,
+    device: str,
+    offload_cpu: bool,
+    offload_dit_cpu: bool,
+):
+    status = init_model(model_name, device, offload_cpu, offload_dit_cpu)
+    return status, get_workflow_status()
+def build_ui():
+    available_models = get_available_models()
+    with gr.Blocks(title="ACE-Step 1.5 LoRA Studio", theme=gr.themes.Soft()) as app:
+        gr.Markdown(
+            "# ACE-Step 1.5 LoRA Studio\n"
+            "Use this guided workflow from left to right.\n\n"
+            "**Step 1:** Initialize model  \n"
+            "**Step 2:** Load dataset  \n"
+            "**Step 3:** Start training  \n"
+            "**Step 4:** Evaluate adapter"
+        )
+        with gr.Row():
+            workflow_status = gr.Textbox(label="Workflow Status", value=get_workflow_status(), lines=6, interactive=False)
+            refresh_status_btn = gr.Button("Refresh Status")
+        refresh_status_btn.click(get_workflow_status, outputs=workflow_status, api_name="workflow_status")
+        # ---- Step 1 ----
+        with gr.Tab("Step 1 - Initialize Model"):
+            gr.Markdown(
+                "### Instructions\n"
+                "1. Pick a model (`acestep-v15-base` recommended for LoRA).\n"
+                "2. Keep device on `auto` unless you need manual override.\n"
+                "3. Click **Initialize Model** and confirm status is success."
+            )
+            with gr.Row():
+                model_dd = gr.Dropdown(
+                    choices=available_models,
+                    value=available_models[0] if available_models else None,
+                    label="DiT Model",
+                )
+                device_dd = gr.Dropdown(
+                    choices=["auto", "cuda", "mps", "cpu"],
+                    value="auto",
+                    label="Device",
+                )
+            with gr.Row():
+                offload_cb = gr.Checkbox(label="Offload To CPU (optional)", value=False)
+                offload_dit_cb = gr.Checkbox(label="Offload DiT To CPU (optional)", value=False)
+            init_btn = gr.Button("Initialize Model", variant="primary")
+            init_out = gr.Textbox(label="Initialization Output", lines=8, interactive=False)
+            init_btn.click(
+                init_model_and_status,
+                [model_dd, device_dd, offload_cb, offload_dit_cb],
+                [init_out, workflow_status],
+                api_name="init_model",
+            )
+        # ---- Step 2 ----
+        with gr.Tab("Step 2 - Load Dataset"):
+            gr.Markdown(
+                "### Instructions\n"
+                "1. Either scan a folder or drag/drop audio files (+ optional .json sidecars).\n"
+                "2. Confirm tracks appear in the table.\n"
+                "3. Optional: run Auto-Label All to fill caption/lyrics/metas.\n"
+                "4. Optional: edit metadata manually and save sidecar JSON."
+            )
+            with gr.Row():
+                folder_input = gr.Textbox(label="Dataset Folder Path", placeholder="e.g. ./dataset_inbox")
+                scan_btn = gr.Button("Scan Folder")
+            with gr.Row():
+                upload_files = gr.Files(
+                    label="Drag/Drop Audio Files (+ Optional JSON Sidecars)",
+                    file_count="multiple",
+                    file_types=["audio", ".json"],
+                    type="filepath",
+                )
+                upload_btn = gr.Button("Load Dropped Files")
+            scan_msg = gr.Textbox(label="Dataset Result", interactive=False)
+            dataset_table = gr.Dataframe(
+                headers=["File", "Duration", "Caption", "Lyrics", "Language"],
+                datatype=["str", "str", "str", "str", "str"],
+                label="Tracks",
+                interactive=False,
+            )
+            scan_btn.click(
+                scan_folder,
+                folder_input,
+                [scan_msg, dataset_table],
+                api_name="scan_folder",
+            )
+            upload_btn.click(
+                load_uploaded,
+                upload_files,
+                [scan_msg, dataset_table],
+                api_name="load_uploaded",
+            )
+            with gr.Accordion("Auto-Label (ACE audio understanding)", open=False):
+                gr.Markdown(
+                    "Auto-label uses ACE: audio -> semantic codes -> metadata/lyrics.\n"
+                    "Initialize LM first, then run Auto-Label All.\n"
+                    "Use Caption-Only if your dataset has no lyrics.\n"
+                    "On Zero GPU, process in smaller batches and click Auto-Label All repeatedly."
+                )
+                with gr.Row():
+                    lm_model_dd = gr.Dropdown(
+                        choices=["acestep-5Hz-lm-0.6B", "acestep-5Hz-lm-1.7B", "acestep-5Hz-lm-4B"],
+                        value="acestep-5Hz-lm-0.6B",
+                        label="Auto-Label LM Model",
+                    )
+                    lm_backend_dd = gr.Dropdown(
+                        choices=["pt", "vllm", "mlx"],
+                        value="pt",
+                        label="LM Backend",
+                    )
+                    lm_device_dd = gr.Dropdown(
+                        choices=["auto", "cuda", "mps", "xpu", "cpu"],
+                        value="auto",
+                        label="LM Device",
+                    )
+                with gr.Row():
+                    init_lm_btn = gr.Button("Initialize Auto-Label LM")
+                    overwrite_cb = gr.Checkbox(label="Overwrite Existing Caption/Lyrics", value=False)
+                    caption_only_cb = gr.Checkbox(label="Caption-Only (Skip Lyrics)", value=True)
+                    auto_label_btn = gr.Button("Auto-Label All", variant="primary")
+                with gr.Row():
+                    max_files_per_run = gr.Slider(1, 25, value=6, step=1, label="Files Per Run (Zero GPU Safe)")
+                    reset_cursor_cb = gr.Checkbox(label="Restart From First Track", value=False)
+                lm_init_status = gr.Textbox(label="Auto-Label LM Status", lines=5, interactive=False)
+                auto_label_status = gr.Textbox(label="Auto-Label Summary", interactive=False)
+                auto_label_log = gr.Textbox(label="Auto-Label Log", lines=8, interactive=False)
+                init_lm_btn.click(
+                    init_auto_label_lm,
+                    [lm_model_dd, lm_backend_dd, lm_device_dd],
+                    lm_init_status,
+                    api_name="init_auto_label_lm",
+                )
+                auto_label_btn.click(
+                    auto_label_all,
+                    [overwrite_cb, caption_only_cb, max_files_per_run, reset_cursor_cb],
+                    [auto_label_status, dataset_table, auto_label_log],
+                    api_name="auto_label_all",
+                )
+            with gr.Accordion("Optional: Edit Metadata Sidecar", open=False):
+                with gr.Row():
+                    edit_idx = gr.Number(label="Track Index (0-based)", value=0, precision=0)
+                    edit_caption = gr.Textbox(label="Caption")
+                    edit_lyrics = gr.Textbox(label="Lyrics", lines=3)
+                with gr.Row():
+                    edit_bpm = gr.Textbox(label="BPM", placeholder="e.g. 120")
+                    edit_key = gr.Textbox(label="Key/Scale", placeholder="e.g. Am")
+                    edit_lang = gr.Textbox(label="Language", value="en")
+                save_btn = gr.Button("Save Sidecar")
+                save_msg = gr.Textbox(label="Save Result", interactive=False)
+                save_btn.click(
+                    save_sidecar,
+                    [edit_idx, edit_caption, edit_lyrics, edit_bpm, edit_key, edit_lang],
+                    save_msg,
+                    api_name="save_sidecar",
+                )
+        # ---- Step 3 ----
+        with gr.Tab("Step 3 - Train LoRA"):
+            gr.Markdown(
+                "### Instructions\n"
+                "1. Keep default settings for first run.\n"
+                "2. Set output directory (defaults are good).\n"
+                "3. Click **Start Training** and monitor logs/loss.\n"
+                "4. Use **Stop Training** for graceful stop."
+            )
+            with gr.Row():
+                t_epochs = gr.Slider(1, 500, value=50, step=1, label="Epochs")
+                t_bs = gr.Slider(1, 8, value=1, step=1, label="Batch Size")
+                t_accum = gr.Slider(1, 16, value=1, step=1, label="Grad Accumulation")
+            with gr.Row():
+                t_outdir = gr.Textbox(label="Output Directory", value=DEFAULT_OUTPUT_DIR)
+                t_resume = gr.Textbox(label="Resume From Adapter Directory (optional)", value="")
+            with gr.Accordion("Advanced Training Settings (optional)", open=False):
+                with gr.Row():
+                    t_rank = gr.Slider(4, 256, value=64, step=4, label="LoRA Rank")
+                    t_alpha = gr.Slider(4, 256, value=64, step=4, label="LoRA Alpha")
+                    t_dropout = gr.Slider(0.0, 0.5, value=0.1, step=0.01, label="LoRA Dropout")
+                with gr.Row():
+                    t_lr = gr.Number(label="Learning Rate", value=1e-4)
+                    t_wd = gr.Number(label="Weight Decay", value=0.01)
+                    t_optim = gr.Dropdown(["adamw", "adamw_8bit"], value="adamw_8bit", label="Optimizer")
+                with gr.Row():
+                    t_grad_norm = gr.Number(label="Max Grad Norm", value=1.0)
+                    t_warmup = gr.Number(label="Warmup Ratio", value=0.03)
+                    t_sched = gr.Dropdown(
+                        ["constant_with_warmup", "linear", "cosine"],
+                        value="constant_with_warmup",
+                        label="Scheduler",
+                    )
+                with gr.Row():
+                    t_save = gr.Slider(1, 100, value=10, step=1, label="Save Every N Epochs")
+                    t_log = gr.Slider(1, 100, value=5, step=1, label="Log Every N Steps")
+                    t_shift = gr.Number(label="Timestep Shift", value=3.0)
+                    t_maxdur = gr.Number(label="Max Audio Duration (s)", value=240)
+            with gr.Row():
+                train_btn = gr.Button("Start Training", variant="primary")
+                stop_btn = gr.Button("Stop Training", variant="stop")
+                poll_btn = gr.Button("Refresh Log")
+            train_status = gr.Textbox(label="Training Log", lines=12, interactive=False)
+            loss_chart = gr.LinePlot(
+                x="Step",
+                y="Loss",
+                title="Training Loss",
+                x_title="Step",
+                y_title="Loss",
+            )
+            train_btn.click(
+                start_training,
+                [
+                    t_rank, t_alpha, t_dropout,
+                    t_lr, t_wd, t_optim,
+                    t_grad_norm, t_warmup, t_sched,
+                    t_epochs, t_bs, t_accum,
+                    t_save, t_log, t_shift,
+                    t_maxdur, t_outdir, t_resume,
+                ],
+                train_status,
+                api_name="start_training",
+            )
+            stop_btn.click(stop_training, outputs=train_status, api_name="stop_training")
+            def _poll_and_format():
+                log_text, chart_data = poll_training()
+                if chart_data:
+                    import pandas as pd
+                    df = pd.DataFrame(chart_data, columns=["Step", "Loss"])
+                else:
+                    import pandas as pd
+                    df = pd.DataFrame({"Step": [], "Loss": []})
+                return log_text, df
+            poll_btn.click(_poll_and_format, outputs=[train_status, loss_chart], api_name="poll_training")
+        # ---- Step 4 ----
+        with gr.Tab("Step 4 - Evaluate"):
+            gr.Markdown(
+                "### Instructions\n"
+                "1. Refresh adapter list and load a trained adapter.\n"
+                "2. Run single generation or A/B test.\n"
+                "3. Use same seed for fair comparison."
+            )
+            with gr.Accordion("Adapter Management", open=True):
+                with gr.Row():
+                    adapter_dir = gr.Textbox(label="Adapters Directory", value=DEFAULT_OUTPUT_DIR)
+                    refresh_btn = gr.Button("Refresh List")
+                adapter_dd = gr.Dropdown(label="Select Adapter", choices=[])
+                with gr.Row():
+                    load_btn = gr.Button("Load Adapter", variant="primary")
+                    unload_btn = gr.Button("Unload Adapter")
+                adapter_status = gr.Textbox(label="Adapter Status", interactive=False)
+                def _refresh(d):
+                    adapters = list_adapters(d)
+                    return gr.update(choices=adapters, value=adapters[0] if adapters else None)
+                refresh_btn.click(_refresh, adapter_dir, adapter_dd, api_name="list_adapters")
+                load_btn.click(load_adapter, adapter_dd, adapter_status, api_name="load_adapter")
+                unload_btn.click(unload_adapter, outputs=adapter_status, api_name="unload_adapter")
+            with gr.Accordion("Generation Settings", open=True):
+                with gr.Row():
+                    eval_prompt = gr.Textbox(label="Prompt / Caption", lines=2, placeholder="upbeat pop rock with electric guitar")
+                    eval_lyrics = gr.Textbox(label="Lyrics", lines=3, placeholder="[Instrumental]")
+                with gr.Row():
+                    eval_dur = gr.Slider(10, 300, value=30, step=5, label="Duration (s)")
+                    eval_bpm = gr.Number(label="BPM (0 = auto)", value=0)
+                    eval_steps = gr.Slider(1, 100, value=8, step=1, label="Inference Steps")
+                with gr.Row():
+                    eval_guidance = gr.Slider(1.0, 15.0, value=7.0, step=0.5, label="Guidance Scale")
+                    eval_seed = gr.Number(label="Seed (-1 = random)", value=-1)
+            with gr.Row():
+                sg_use_lora = gr.Checkbox(label="Use LoRA", value=True)
+                sg_scale = gr.Slider(0.0, 1.0, value=1.0, step=0.05, label="LoRA Scale")
+                sg_btn = gr.Button("Generate", variant="primary")
+            sg_audio = gr.Audio(label="Single Output", type="filepath")
+            sg_msg = gr.Textbox(label="Generation Status", interactive=False)
+            sg_btn.click(
+                generate_sample,
+                [eval_prompt, eval_lyrics, eval_dur, eval_bpm, eval_steps, eval_guidance, eval_seed, sg_use_lora, sg_scale],
+                [sg_audio, sg_msg],
+                api_name="generate_sample",
+            )
+            gr.Markdown("#### A/B Test (Base vs LoRA)")
+            with gr.Row():
+                ab_scale = gr.Slider(0.0, 1.0, value=1.0, step=0.05, label="LoRA Scale for B")
+                ab_btn = gr.Button("Run A/B Test")
+            with gr.Row():
+                ab_audio_a = gr.Audio(label="A - Base", type="filepath")
+                ab_audio_b = gr.Audio(label="B - Base + LoRA", type="filepath")
+            with gr.Row():
+                ab_msg_a = gr.Textbox(label="Status A", interactive=False)
+                ab_msg_b = gr.Textbox(label="Status B", interactive=False)
+            ab_btn.click(
+                ab_test,
+                [eval_prompt, eval_lyrics, eval_dur, eval_bpm, eval_steps, eval_guidance, eval_seed, ab_scale],
+                [ab_audio_a, ab_msg_a, ab_audio_b, ab_msg_b],
+                api_name="ab_test",
+            )
+    app.queue(default_concurrency_limit=1)
+    return app
+# ===========================================================================
+# Entry point
+# ===========================================================================
+if __name__ == "__main__":
+    app = build_ui()
+    app.launch(
+        server_name="0.0.0.0",
+        server_port=7860,
+        share=False,
+    )

packages.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ ffmpeg
2	+ libsndfile1

requirements.txt CHANGED Viewed

@@ -14,3 +14,7 @@ vector-quantize-pytorch
 PyYAML
 modelscope
 filelock>=3.13.0

 PyYAML
 modelscope
 filelock>=3.13.0
+peft>=0.11.0
+gradio>=4.0.0
+pandas
+bitsandbytes

scripts/endpoint/generate_interactive.py ADDED Viewed

	@@ -0,0 +1,223 @@

+import argparse
+import base64
+import json
+import os
+import sys
+import time
+from pathlib import Path
+from urllib.error import HTTPError, URLError
+from urllib.request import Request, urlopen
+DEFAULT_URL = "https://your-endpoint-url.endpoints.huggingface.cloud"
+DEFAULT_SAMPLE_RATE = 44100
+def read_dotenv_value(key: str, dotenv_path: str = ".env") -> str:
+    path = Path(dotenv_path)
+    if not path.exists():
+        return ""
+    for raw in path.read_text(encoding="utf-8").splitlines():
+        line = raw.strip()
+        if not line or line.startswith("#") or "=" not in line:
+            continue
+        k, v = line.split("=", 1)
+        if k.strip() == key:
+            return v.strip().strip('"').strip("'")
+    return ""
+def prompt_text(label: str, default: str = "", required: bool = False) -> str:
+    while True:
+        suffix = f" [{default}]" if default else ""
+        value = input(f"{label}{suffix}: ").strip()
+        if not value:
+            value = default
+        if value or not required:
+            return value
+        print("Value required.")
+def prompt_int(label: str, default: int | None = None, allow_blank: bool = False) -> int | None:
+    while True:
+        default_str = "" if default is None else str(default)
+        value = prompt_text(label, default_str, required=not allow_blank)
+        if not value and allow_blank:
+            return None
+        try:
+            return int(value)
+        except ValueError:
+            print("Enter a valid integer.")
+def prompt_float(label: str, default: float) -> float:
+    while True:
+        value = prompt_text(label, str(default), required=True)
+        try:
+            return float(value)
+        except ValueError:
+            print("Enter a valid number.")
+def prompt_yes_no(label: str, default: bool) -> bool:
+    default_text = "y" if default else "n"
+    while True:
+        value = prompt_text(f"{label} (y/n)", default_text, required=True).lower()
+        if value in {"y", "yes", "1", "true", "t"}:
+            return True
+        if value in {"n", "no", "0", "false", "f"}:
+            return False
+        print("Please answer y or n.")
+def prompt_multiline(label: str, end_token: str = "END") -> str:
+    print(label)
+    print(f"Finish lyrics by typing {end_token} on its own line.")
+    lines: list[str] = []
+    while True:
+        line = input()
+        if line.strip() == end_token:
+            break
+        lines.append(line)
+    return "\n".join(lines).strip()
+def prompt_lyrics_optional() -> str:
+    use_lyrics = prompt_yes_no("Add custom lyrics", True)
+    if not use_lyrics:
+        return ""
+    return prompt_multiline("Paste lyrics (or just type END for none)")
+def send_request(url: str, token: str, payload: dict) -> dict:
+    data = json.dumps(payload).encode("utf-8")
+    req = Request(
+        url=url,
+        data=data,
+        method="POST",
+        headers={
+            "Authorization": f"Bearer {token}",
+            "Content-Type": "application/json",
+        },
+    )
+    try:
+        with urlopen(req, timeout=3600) as resp:
+            body = resp.read().decode("utf-8")
+            return json.loads(body)
+    except HTTPError as e:
+        text = e.read().decode("utf-8", errors="replace")
+        raise RuntimeError(f"HTTP {e.code}: {text}") from e
+    except URLError as e:
+        raise RuntimeError(f"Network error: {e}") from e
+def resolve_token(cli_token: str) -> str:
+    if cli_token:
+        return cli_token
+    env_token = os.getenv("HF_TOKEN") or os.getenv("hf_token")
+    if env_token:
+        return env_token
+    dotenv_token = read_dotenv_value("hf_token") or read_dotenv_value("HF_TOKEN")
+    return dotenv_token
+def main() -> int:
+    parser = argparse.ArgumentParser(description="Interactive ACE-Step endpoint generator")
+    parser.add_argument("--url", default=os.getenv("HF_ENDPOINT_URL", DEFAULT_URL), help="Inference endpoint URL")
+    parser.add_argument("--token", default="", help="HF token. If omitted, uses env/.env")
+    parser.add_argument("--prompt", default="", help="Initial prompt")
+    parser.add_argument("--out-file", default="", help="Output WAV file path")
+    parser.add_argument(
+        "--advanced",
+        action="store_true",
+        help="Ask advanced generation options (seed/guidance/steps/sample-rate/LM).",
+    )
+    args = parser.parse_args()
+    print("=== ACE-Step Interactive Generation ===")
+    token = resolve_token(args.token)
+    if not token:
+        print("No token found. Set HF_TOKEN or hf_token in .env, or pass --token.")
+        return 1
+    url = prompt_text("Endpoint URL", args.url, required=True)
+    music_prompt = prompt_text("Music prompt", args.prompt, required=True)
+    bpm = prompt_int("BPM (blank for auto)", None, allow_blank=True)
+    duration_sec = prompt_int("Duration seconds", 120)
+    instrumental = prompt_yes_no("Instrumental (no vocals)", False)
+    lyrics = "" if instrumental else prompt_lyrics_optional()
+    # Quality-first defaults: use SFT + LM path configured on the endpoint.
+    sample_rate = DEFAULT_SAMPLE_RATE
+    steps = 50
+    guidance_scale = 7.0
+    seed = 42
+    use_lm = True
+    allow_fallback = False
+    simple_prompt = False
+    if args.advanced:
+        print("\nAdvanced options:")
+        sample_rate = prompt_int("Sample rate", DEFAULT_SAMPLE_RATE)
+        steps = prompt_int("Steps", 50)
+        guidance_scale = prompt_float("Guidance scale", 7.0)
+        seed = prompt_int("Seed", 42)
+        use_lm = prompt_yes_no("Use LM planning (higher quality, slower)", True)
+        allow_fallback = prompt_yes_no("Allow fallback sine audio", False)
+    default_out = args.out_file or f"music_{int(time.time())}.wav"
+    out_file = prompt_text("Output file", default_out, required=True)
+    inputs = {
+        "prompt": music_prompt,
+        "duration_sec": duration_sec,
+        "sample_rate": sample_rate,
+        "seed": seed,
+        "guidance_scale": guidance_scale,
+        "steps": steps,
+        "use_lm": use_lm,
+        "simple_prompt": simple_prompt,
+        "instrumental": instrumental,
+        "allow_fallback": allow_fallback,
+    }
+    if bpm is not None:
+        inputs["bpm"] = bpm
+    if lyrics:
+        inputs["lyrics"] = lyrics
+    payload = {"inputs": inputs}
+    print("\nSending request...")
+    try:
+        response = send_request(url, token, payload)
+    except Exception as e:
+        print(f"Request failed: {e}")
+        return 1
+    print("Response summary:")
+    print(json.dumps({
+        "used_fallback": response.get("used_fallback"),
+        "model_loaded": response.get("model_loaded"),
+        "model_error": response.get("model_error"),
+        "sample_rate": response.get("sample_rate"),
+        "duration_sec": response.get("duration_sec"),
+    }, indent=2))
+    if response.get("error"):
+        print(f"Endpoint error: {response['error']}")
+        return 1
+    audio_b64 = response.get("audio_base64_wav")
+    if not audio_b64:
+        print("No audio_base64_wav in response.")
+        return 1
+    audio_bytes = base64.b64decode(audio_b64)
+    Path(out_file).write_bytes(audio_bytes)
+    print(f"Saved audio: {out_file}")
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

scripts/endpoint/test.bat ADDED Viewed

	@@ -0,0 +1,16 @@

+@echo off
+setlocal
+set "PROMPT=%*"
+if not defined PROMPT set "PROMPT=upbeat pop rap with emotional guitar"
+set "PROMPT=%PROMPT:"=%"
+powershell -NoProfile -ExecutionPolicy Bypass -File "%~dp0test.ps1" -Prompt "%PROMPT%" -SimplePrompt -DurationSec 12 -SampleRate 44100 -Seed 42 -GuidanceScale 7.0 -Steps 50 -UseLM 1 -OutFile "test_music.wav"
+if errorlevel 1 (
+  echo Request failed.
+  exit /b 1
+)
+echo Done.
+endlocal

scripts/endpoint/test.ps1 ADDED Viewed

	@@ -0,0 +1,257 @@

+param(
+  [string]$Token = "",
+  [string]$Url   = "",
+  [string]$Prompt = "upbeat pop rap with emotional guitar",
+  [string]$Lyrics = "",
+  [int]$DurationSec = 3,
+  [int]$SampleRate  = 44100,
+  [int]$Seed        = 42,
+  [double]$GuidanceScale = 7.0,
+  [int]$Steps = 50,
+  [string]$UseLM = "true",
+  [switch]$SimplePrompt,
+  [switch]$Instrumental,
+  [switch]$RnbLoveTemptation,
+  [switch]$RnbPopRap2Min,
+  [switch]$AllowFallback,
+  [string]$OutFile  = "test_music.wav"
+)
+$ErrorActionPreference = "Stop"
+if (-not $Token) {
+  $Token = $env:HF_TOKEN
+}
+if (-not $Url) {
+  $Url = $env:HF_ENDPOINT_URL
+}
+if (-not $Token) {
+  throw "HF token not provided. Use -Token or set HF_TOKEN."
+}
+if (-not $Url) {
+  throw "Endpoint URL not provided. Use -Url or set HF_ENDPOINT_URL."
+}
+if ($RnbLoveTemptation.IsPresent) {
+  $Prompt = "melodic RnB pop with ambient melodies, evolving chord progression, emotional modern production, intimate vocals, soulful hooks"
+  $Lyrics = @"
+[Verse 1]
+Late night shadows on my skin,
+I hear your name where the silence has been.
+I swore I'd run when the fire got close,
+But I keep chasing what hurts me the most.
+[Pre-Chorus]
+Every promise pulls me back again,
+Sweet poison dressed like a loyal friend.
+[Chorus]
+I'm fighting love and temptation,
+Heart in a war with my own salvation.
+One touch and my defenses break,
+I know it's danger but I stay awake.
+I'm fighting love and temptation,
+Drowning slow in this sweet devastation.
+I want to leave but I hesitate,
+Cause I still crave what I know I should hate.
+[Verse 2]
+Your voice is velvet over broken glass,
+I learn the pain but I still relapse.
+Truth on my lips, lies in my veins,
+I pray for peace while I dance in flames.
+[Bridge]
+If love is a test, I'm failing with grace,
+Still falling for fire, still calling your name.
+[Final Chorus]
+I'm fighting love and temptation,
+Heart in a war with my own salvation.
+One touch and my defenses break,
+I know it's danger but I stay awake.
+"@
+  if ($DurationSec -eq 3) {
+    $DurationSec = 24
+  }
+  if (-not $PSBoundParameters.ContainsKey("SimplePrompt")) {
+    $SimplePrompt = $false
+  }
+}
+if ($RnbPopRap2Min.IsPresent) {
+  $Prompt = "2 minute RnB Pop Rap song, melodic ambient pads, emotional chord progression, intimate female and male vocal blend, catchy hooks, modern drums, deep 808, vulnerable but confident tone"
+  $Lyrics = @"
+[Intro]
+Mm, midnight in my head, no sleep.
+Same old war in my chest, still deep.
+I say I'm done, then I call your name,
+I run from the fire, then I walk in the flame.
+[Verse 1]
+Streetlights drip on the window pane,
+I wear my pride like a silver chain.
+Say I don't need you, that's what I say,
+But your voice in my mind don't fade away.
+I got dreams, got scars, got bills to pay,
+Still I fold when your eyes pull me in that way.
+I know better, I swear I do,
+But temptation sounds like truth when it sounds like you.
+[Pre-Chorus]
+Every promise tastes sweet then turns to smoke,
+I keep rebuilding hearts that we already broke.
+I want peace, but I want your touch,
+I know it's too much, still it's never enough.
+[Chorus]
+I'm fighting love and temptation,
+Heart on trial with no salvation.
+One more kiss and the walls cave in,
+I lose myself just to feel again.
+I'm fighting love and temptation,
+Drowning slow in sweet devastation.
+I say goodbye, then I hesitate,
+Cause I still crave what I know I should hate.
+[Rap Verse 1]
+Look, I been in and out the same lane,
+Different night, same rain.
+Tell myself "don't text back,"
+Still type your name, press send, same pain.
+You the high and the low in one dose,
+Got me praying for distance, still close.
+I play tough, but the truth is loud,
+When you're gone, all this noise in the crowd.
+Yeah, I hustle, I grind, I glow,
+But alone in the dark, I'm a different soul.
+If love was logic, I'd be free by now,
+But my heart ain't science, I just bleed it out.
+[Verse 2]
+Your perfume still lives in my hoodie seams,
+Like a ghost in the corners of all my dreams.
+I learned your chaos, your every disguise,
+The saint in your smile, the storm in your eyes.
+I touch your hand and forget my name,
+Call it desire, call it blame.
+I need healing, I need release,
+But your lips keep turning my war to peace.
+[Pre-Chorus]
+Every promise tastes sweet then turns to smoke,
+I keep rebuilding hearts that we already broke.
+I want peace, but I want your touch,
+I know it's too much, still it's never enough.
+[Chorus]
+I'm fighting love and temptation,
+Heart on trial with no salvation.
+One more kiss and the walls cave in,
+I lose myself just to feel again.
+I'm fighting love and temptation,
+Drowning slow in sweet devastation.
+I say goodbye, then I hesitate,
+Cause I still crave what I know I should hate.
+[Rap Verse 2]
+Uh, late calls, no sleep, red eyes,
+Truth hurts more than sweet lies.
+We toxic, but the chemistry loud,
+Like thunder in a summer night over this town.
+Tell me leave, then you pull me near,
+Tell me "trust me," then feed my fear.
+I keep faith in a broken map,
+Tryna find us on roads that don't lead back.
+I got plans, got goals, got pride,
+But temptation got hands on the wheel tonight.
+If I fall, let me fall with grace,
+I still see home when I look in your face.
+[Bridge]
+If this love is a test, I'm failing in style,
+Smiling through fire for one more while.
+I know I should run, I know I should wait,
+But your name on my tongue sounds too much like fate.
+[Final Chorus]
+I'm fighting love and temptation,
+Heart on trial with no salvation.
+One more kiss and the walls cave in,
+I lose myself just to feel again.
+I'm fighting love and temptation,
+Drowning slow in sweet devastation.
+I say goodbye, then I hesitate,
+Cause I still crave what I know I should hate.
+[Outro]
+Mm, midnight in my head, no sleep.
+Still your name in my chest, too deep.
+"@
+  if ($DurationSec -eq 3) {
+    $DurationSec = 120
+  }
+  if (-not $PSBoundParameters.ContainsKey("SimplePrompt")) {
+    $SimplePrompt = $false
+  }
+}
+$useLmBool = $true
+if ($null -ne $UseLM -and $UseLM -ne "") {
+  try {
+    $useLmBool = [System.Convert]::ToBoolean($UseLM)
+  }
+  catch {
+    $useLmBool = ($UseLM -match '^(1|true|t|yes|y|on)$')
+  }
+}
+$inputs = @{
+  prompt         = $Prompt
+  duration_sec   = $DurationSec
+  sample_rate    = $SampleRate
+  seed           = $Seed
+  guidance_scale = $GuidanceScale
+  steps          = $Steps
+  use_lm         = $useLmBool
+  allow_fallback = $AllowFallback.IsPresent
+}
+if ($Lyrics) {
+  $inputs["lyrics"] = $Lyrics
+}
+if ($SimplePrompt.IsPresent) {
+  $inputs["simple_prompt"] = $true
+}
+if ($Instrumental.IsPresent) {
+  $inputs["instrumental"] = $true
+}
+$body = @{ inputs = $inputs } | ConvertTo-Json -Depth 8
+$response = Invoke-RestMethod -Method Post -Uri $Url -Headers @{
+  Authorization = "Bearer $Token"
+  "Content-Type" = "application/json"
+} -Body $body
+$response | ConvertTo-Json -Depth 6
+if ($response.error) {
+  throw "Endpoint returned error: $($response.error)"
+}
+if ($response.used_fallback -and -not $AllowFallback.IsPresent) {
+  throw "Endpoint used fallback audio. Set -AllowFallback only if you want fallback behavior."
+}
+if (-not $response.audio_base64_wav) {
+  throw "No audio_base64_wav returned."
+}
+[IO.File]::WriteAllBytes($OutFile, [Convert]::FromBase64String($response.audio_base64_wav))
+Write-Host "Saved audio file: $OutFile"

scripts/endpoint/test_rnb.bat ADDED Viewed

	@@ -0,0 +1,12 @@

+@echo off
+setlocal
+powershell -NoProfile -ExecutionPolicy Bypass -File "%~dp0test.ps1" -RnbLoveTemptation -DurationSec 24 -SampleRate 44100 -Seed 42 -GuidanceScale 7.0 -Steps 8 -UseLM 1 -OutFile "test_rnb_music.wav"
+if errorlevel 1 (
+  echo Request failed.
+  exit /b 1
+)
+echo Done.
+endlocal

scripts/endpoint/test_rnb_2min.bat ADDED Viewed

	@@ -0,0 +1,12 @@

+@echo off
+setlocal
+powershell -NoProfile -ExecutionPolicy Bypass -File "%~dp0test.ps1" -RnbPopRap2Min -DurationSec 120 -SampleRate 44100 -Seed 42 -GuidanceScale 7.0 -Steps 8 -UseLM 1 -OutFile "test_rnb_pop_rap_2min.wav"
+if errorlevel 1 (
+  echo Request failed.
+  exit /b 1
+)
+echo Done.
+endlocal

scripts/hf_clone.py ADDED Viewed

	@@ -0,0 +1,321 @@

+#!/usr/bin/env python
+"""
+Bootstrap this project into your own Hugging Face Space and/or Endpoint repo.
+Examples:
+  python scripts/hf_clone.py space --repo-id your-name/ace-step-lora-studio
+  python scripts/hf_clone.py endpoint --repo-id your-name/ace-step-endpoint
+  python scripts/hf_clone.py all --space-repo-id your-name/ace-step-lora-studio --endpoint-repo-id your-name/ace-step-endpoint
+"""
+from __future__ import annotations
+import argparse
+import os
+import shutil
+import tempfile
+from pathlib import Path
+from typing import Iterable
+from huggingface_hub import HfApi
+PROJECT_ROOT = Path(__file__).resolve().parents[1]
+COMMON_SKIP_DIRS = {
+    ".git",
+    "__pycache__",
+    ".pytest_cache",
+    ".mypy_cache",
+    ".ruff_cache",
+    ".venv",
+    "venv",
+    "env",
+    ".idea",
+    ".vscode",
+    ".cache",
+    ".huggingface",
+    ".gradio",
+    "checkpoints",
+    "lora_output",
+    "outputs",
+    "artifacts",
+    "models",
+    "datasets",
+    "Lora-ace-step",
+}
+COMMON_SKIP_FILES = {
+    ".env",
+}
+COMMON_SKIP_PREFIXES = (
+    "song_summaries_llm",
+)
+COMMON_SKIP_SUFFIXES = {
+    ".wav",
+    ".flac",
+    ".mp3",
+    ".ogg",
+    ".opus",
+    ".m4a",
+    ".aac",
+    ".pt",
+    ".bin",
+    ".safetensors",
+    ".ckpt",
+    ".onnx",
+    ".log",
+    ".pyc",
+    ".pyo",
+    ".pyd",
+}
+MAX_FILE_BYTES = 30 * 1024 * 1024  # 30MB safety cap for upload snapshot
+def _should_skip_common(rel_path: Path, is_dir: bool) -> bool:
+    if any(part in COMMON_SKIP_DIRS for part in rel_path.parts):
+        return True
+    if rel_path.name in COMMON_SKIP_FILES:
+        return True
+    if any(rel_path.name.startswith(prefix) for prefix in COMMON_SKIP_PREFIXES):
+        return True
+    if not is_dir and rel_path.suffix.lower() in COMMON_SKIP_SUFFIXES:
+        return True
+    return False
+def _copy_file(src: Path, dst: Path) -> None:
+    dst.parent.mkdir(parents=True, exist_ok=True)
+    shutil.copy2(src, dst)
+def _stage_space_snapshot(staging_dir: Path) -> tuple[int, int, list[str]]:
+    copied = 0
+    bytes_total = 0
+    skipped: list[str] = []
+    for src in PROJECT_ROOT.rglob("*"):
+        rel = src.relative_to(PROJECT_ROOT)
+        if src.is_dir():
+            if _should_skip_common(rel, is_dir=True):
+                skipped.append(f"{rel}/")
+            continue
+        if _should_skip_common(rel, is_dir=False):
+            skipped.append(str(rel))
+            continue
+        size = src.stat().st_size
+        if size > MAX_FILE_BYTES:
+            skipped.append(f"{rel} (>{MAX_FILE_BYTES // (1024 * 1024)}MB)")
+            continue
+        dst = staging_dir / rel
+        _copy_file(src, dst)
+        copied += 1
+        bytes_total += size
+    return copied, bytes_total, skipped
+def _iter_endpoint_paths() -> Iterable[Path]:
+    # Minimal runtime set for custom endpoint repos.
+    required = [
+        PROJECT_ROOT / "handler.py",
+        PROJECT_ROOT / "requirements.txt",
+        PROJECT_ROOT / "packages.txt",
+        PROJECT_ROOT / "acestep",
+    ]
+    for p in required:
+        if p.exists():
+            yield p
+    template_readme = PROJECT_ROOT / "templates" / "hf-endpoint" / "README.md"
+    if template_readme.exists():
+        yield template_readme
+def _stage_endpoint_snapshot(staging_dir: Path) -> tuple[int, int]:
+    copied = 0
+    bytes_total = 0
+    for src in _iter_endpoint_paths():
+        if src.is_file():
+            rel_dst = Path("README.md") if src.name == "README.md" and "templates" in src.parts else Path(src.name)
+            dst = staging_dir / rel_dst
+            _copy_file(src, dst)
+            copied += 1
+            bytes_total += src.stat().st_size
+            continue
+        if src.is_dir():
+            for nested in src.rglob("*"):
+                rel_nested = nested.relative_to(src)
+                if nested.is_dir():
+                    if _should_skip_common(Path(src.name) / rel_nested, is_dir=True):
+                        continue
+                    continue
+                if _should_skip_common(Path(src.name) / rel_nested, is_dir=False):
+                    continue
+                if nested.suffix.lower() in {".wav", ".flac", ".mp3", ".ogg"}:
+                    continue
+                dst = staging_dir / src.name / rel_nested
+                _copy_file(nested, dst)
+                copied += 1
+                bytes_total += nested.stat().st_size
+    return copied, bytes_total
+def _resolve_token(arg_token: str) -> str | None:
+    if arg_token:
+        return arg_token
+    return os.getenv("HF_TOKEN")
+def _ensure_repo(
+    api: HfApi,
+    repo_id: str,
+    repo_type: str,
+    private: bool,
+    space_sdk: str | None = None,
+) -> None:
+    kwargs = {
+        "repo_id": repo_id,
+        "repo_type": repo_type,
+        "private": private,
+        "exist_ok": True,
+    }
+    if repo_type == "space" and space_sdk:
+        kwargs["space_sdk"] = space_sdk
+    api.create_repo(**kwargs)
+def _upload_snapshot(
+    api: HfApi,
+    repo_id: str,
+    repo_type: str,
+    folder_path: Path,
+    commit_message: str,
+) -> None:
+    api.upload_folder(
+        repo_id=repo_id,
+        repo_type=repo_type,
+        folder_path=str(folder_path),
+        commit_message=commit_message,
+        delete_patterns=[],
+    )
+def _fmt_mb(num_bytes: int) -> str:
+    return f"{num_bytes / (1024 * 1024):.2f} MB"
+def clone_space(repo_id: str, private: bool, token: str | None, dry_run: bool) -> None:
+    with tempfile.TemporaryDirectory(prefix="hf_space_clone_") as tmp:
+        staging = Path(tmp)
+        copied, bytes_total, skipped = _stage_space_snapshot(staging)
+        print(f"[space] staged files: {copied}, size: {_fmt_mb(bytes_total)}")
+        if skipped:
+            print(f"[space] skipped entries: {len(skipped)}")
+            for item in skipped[:20]:
+                print(f"  - {item}")
+            if len(skipped) > 20:
+                print(f"  ... and {len(skipped) - 20} more")
+        if dry_run:
+            print("[space] dry-run complete (nothing uploaded).")
+            return
+        api = HfApi(token=token)
+        _ensure_repo(api, repo_id=repo_id, repo_type="space", private=private, space_sdk="gradio")
+        _upload_snapshot(
+            api,
+            repo_id=repo_id,
+            repo_type="space",
+            folder_path=staging,
+            commit_message="Bootstrap ACE-Step LoRA Studio Space",
+        )
+        print(f"[space] uploaded to https://huggingface.co/spaces/{repo_id}")
+def clone_endpoint(repo_id: str, private: bool, token: str | None, dry_run: bool) -> None:
+    with tempfile.TemporaryDirectory(prefix="hf_endpoint_clone_") as tmp:
+        staging = Path(tmp)
+        copied, bytes_total = _stage_endpoint_snapshot(staging)
+        print(f"[endpoint] staged files: {copied}, size: {_fmt_mb(bytes_total)}")
+        if dry_run:
+            print("[endpoint] dry-run complete (nothing uploaded).")
+            return
+        api = HfApi(token=token)
+        _ensure_repo(api, repo_id=repo_id, repo_type="model", private=private)
+        _upload_snapshot(
+            api,
+            repo_id=repo_id,
+            repo_type="model",
+            folder_path=staging,
+            commit_message="Bootstrap ACE-Step custom endpoint repo",
+        )
+        print(f"[endpoint] uploaded to https://huggingface.co/{repo_id}")
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(description="Clone this project into your own HF Space/Endpoint repos.")
+    subparsers = parser.add_subparsers(dest="cmd", required=True)
+    p_space = subparsers.add_parser("space", help="Create/update your HF Space from this project.")
+    p_space.add_argument("--repo-id", required=True, help="Target space repo id, e.g. username/my-space.")
+    p_space.add_argument("--private", action="store_true", help="Create repo as private.")
+    p_space.add_argument("--token", type=str, default="", help="HF token (default: HF_TOKEN env var).")
+    p_space.add_argument("--dry-run", action="store_true", help="Stage files only; do not upload.")
+    p_endpoint = subparsers.add_parser("endpoint", help="Create/update your custom endpoint model repo.")
+    p_endpoint.add_argument("--repo-id", required=True, help="Target model repo id, e.g. username/my-endpoint.")
+    p_endpoint.add_argument("--private", action="store_true", help="Create repo as private.")
+    p_endpoint.add_argument("--token", type=str, default="", help="HF token (default: HF_TOKEN env var).")
+    p_endpoint.add_argument("--dry-run", action="store_true", help="Stage files only; do not upload.")
+    p_all = subparsers.add_parser("all", help="Run both Space and Endpoint bootstrap.")
+    p_all.add_argument("--space-repo-id", required=True, help="Target space repo id.")
+    p_all.add_argument("--endpoint-repo-id", required=True, help="Target endpoint model repo id.")
+    p_all.add_argument("--space-private", action="store_true", help="Create Space as private.")
+    p_all.add_argument("--endpoint-private", action="store_true", help="Create endpoint repo as private.")
+    p_all.add_argument("--token", type=str, default="", help="HF token (default: HF_TOKEN env var).")
+    p_all.add_argument("--dry-run", action="store_true", help="Stage files only; do not upload.")
+    return parser
+def main() -> int:
+    args = build_parser().parse_args()
+    token = _resolve_token(args.token)
+    if not token and not args.dry_run:
+        print("HF token not found. Set HF_TOKEN or pass --token.")
+        return 1
+    if args.cmd == "space":
+        clone_space(args.repo_id, private=bool(args.private), token=token, dry_run=bool(args.dry_run))
+    elif args.cmd == "endpoint":
+        clone_endpoint(args.repo_id, private=bool(args.private), token=token, dry_run=bool(args.dry_run))
+    else:
+        clone_space(args.space_repo_id, private=bool(args.space_private), token=token, dry_run=bool(args.dry_run))
+        clone_endpoint(
+            args.endpoint_repo_id,
+            private=bool(args.endpoint_private),
+            token=token,
+            dry_run=bool(args.dry_run),
+        )
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

scripts/jobs/submit_hf_lora_job.ps1 ADDED Viewed

	@@ -0,0 +1,85 @@

+param(
+  [string]$CodeRepo = "YOUR_USERNAME/ace-step-lora-studio",
+  [string]$DatasetRepo = "",
+  [string]$DatasetRevision = "main",
+  [string]$DatasetSubdir = "",
+  [string]$ModelConfig = "acestep-v15-base",
+  [string]$Flavor = "a10g-large",
+  [string]$Timeout = "8h",
+  [int]$Epochs = 20,
+  [int]$BatchSize = 1,
+  [int]$GradAccum = 1,
+  [string]$OutputDir = "/workspace/output",
+  [string]$UploadRepo = "",
+  [switch]$UploadPrivate,
+  [switch]$Detach
+)
+$ErrorActionPreference = "Stop"
+if (-not $DatasetRepo) {
+  throw "Provide -DatasetRepo (HF dataset repo containing your audio + optional sidecars)."
+}
+$secretArgs = @("--secrets", "HF_TOKEN")
+$uploadArgs = ""
+if ($UploadRepo) {
+  $uploadArgs = "--upload-repo `"$UploadRepo`""
+  if ($UploadPrivate.IsPresent) {
+    $uploadArgs += " --upload-private"
+  }
+}
+$datasetSubdirArgs = ""
+if ($DatasetSubdir) {
+  $datasetSubdirArgs = "--dataset-subdir `"$DatasetSubdir`""
+}
+$detachArg = ""
+if ($Detach.IsPresent) {
+  $detachArg = "--detach"
+}
+$jobCommand = @"
+set -e
+python -m pip install --no-cache-dir --upgrade pip
+git clone https://huggingface.co/$CodeRepo /workspace/code
+cd /workspace/code
+python -m pip install --no-cache-dir -r requirements.txt
+python lora_train.py \
+  --dataset-repo "$DatasetRepo" \
+  --dataset-revision "$DatasetRevision" \
+  $datasetSubdirArgs \
+  --model-config "$ModelConfig" \
+  --device auto \
+  --num-epochs $Epochs \
+  --batch-size $BatchSize \
+  --grad-accum $GradAccum \
+  --output-dir "$OutputDir" \
+  $uploadArgs
+"@
+$argsList = @(
+  "jobs", "run",
+  "--flavor", $Flavor,
+  "--timeout", $Timeout
+) + $secretArgs
+if ($detachArg) {
+  $argsList += $detachArg
+}
+$argsList += @(
+  "pytorch/pytorch:2.5.1-cuda12.1-cudnn9-runtime",
+  "bash", "-lc", $jobCommand
+)
+Write-Host "Submitting HF Job with flavor=$Flavor timeout=$Timeout ..."
+Write-Host "Dataset repo: $DatasetRepo"
+Write-Host "Code repo: $CodeRepo"
+if ($UploadRepo) {
+  Write-Host "Will upload final adapter to: $UploadRepo"
+}
+& hf @argsList

summaries/findings.md ADDED Viewed

	@@ -0,0 +1,68 @@

+# Improving ACE-Step LoRA with Time-Event-Based Annotation
+[Back to project README](../README.md)
+## Baseline context in this repo
+This project already provides a solid end-to-end workflow:
+- Train LoRA adapters with `lora_train.py` and the Gradio UI (`app.py`, `lora_ui.py`).
+- Deploy generation through a custom endpoint runtime (`handler.py`, `acestep/`).
+- Test prompts and lyrics quickly with endpoint client scripts in `scripts/endpoint/`.
+Today, most conditioning in this pipeline is still global (caption, lyrics, BPM, key, tags). That is a strong baseline, but it does not explicitly teach *when* events happen inside a track.
+## Core limitation
+Current annotations usually describe *what* a song is, not *when* events occur. The model can learn style and texture, but temporal structure is weaker:
+- Verse/chorus transitions are often less deliberate than human-produced songs.
+- Build-ups, drops, or effect changes can feel averaged or blurred.
+- Subgenre-specific arrangement timing is harder to reproduce consistently.
+## Why time-event labels are promising
+1. Better musical structure: teach the model where sections start/end and where key transitions occur.
+2. Better genre fidelity: encode timing differences between styles that share similar instruments.
+3. Better control at inference: allow prompting for both content and structure (what + when).
+## Practical direction for this codebase
+A useful next step is to extend the current sidecar metadata approach with optional timed events.
+Example direction:
+- Keep existing fields (`caption`, `lyrics`, `bpm`, etc.).
+- Add an `events` list with event type + start/end times.
+- Start with a small, high-quality subset before scaling.
+Illustrative shape:
+```json
+{
+  "caption": "emotional rnb pop with warm pads",
+  "bpm": 92,
+  "events": [
+    {"type": "intro", "start": 0.0, "end": 8.0},
+    {"type": "verse", "start": 8.0, "end": 32.0},
+    {"type": "chorus", "start": 32.0, "end": 48.0}
+  ]
+}
+```
+## Early experiments worth running
+- Compare baseline LoRA vs time-event LoRA on the same curated mini-dataset.
+- Score structural accuracy (section order, transition timing tolerance).
+- Run blind listening tests for perceived musical arc and arrangement coherence.
+- Track whether time labels improve consistency without reducing creativity.
+## Expected outcomes
+If this works, this repo can evolve from "style-conditioned generation" toward "structure-aware generation":
+- More intentional song progression.
+- Stronger subgenre identity.
+- Better controllability for creators.
+This is still a baseline research note, but it gives a clear technical direction that fits the current project architecture.

templates/hf-endpoint/README.md ADDED Viewed

	@@ -0,0 +1,38 @@

+# ACE-Step Custom Endpoint Repo
+This repo is intended for a Hugging Face **Dedicated Inference Endpoint** with a custom `handler.py`.
+## Contents
+- `handler.py`: Endpoint request/response logic.
+- `acestep/`: Core inference utilities.
+- `requirements.txt`: Python dependencies.
+- `packages.txt`: System dependencies.
+## Expected Request Payload
+```json
+{
+  "inputs": {
+    "prompt": "upbeat pop rap with emotional guitar",
+    "lyrics": "[Verse] city lights and midnight rain",
+    "duration_sec": 12,
+    "sample_rate": 44100,
+    "seed": 42,
+    "guidance_scale": 7.0,
+    "steps": 50,
+    "use_lm": true
+  }
+}
+```
+## Quick Setup
+1. Create a model repo on Hugging Face.
+2. Push this folder content to that repo.
+3. Create a new dedicated endpoint from this custom repo.
+4. Set environment variables on the endpoint as needed:
+   - `ACE_CONFIG_PATH` (default `acestep-v15-sft`)
+   - `ACE_LM_MODEL_PATH` (default `acestep-5Hz-lm-4B`)
+   - `ACE_DOWNLOAD_SOURCE` (`huggingface` or `modelscope`)
+5. Scale down or pause when idle to control cost.