Spaces:

build-small-hackathon
/

LoFinity

Running on Zero

eloigil6 Claude Opus 4.8 commited on 20 days ago

Commit

613bdc6

1 Parent(s): b97956f

Scale model + tape length to the hardware (GPU vs CPU)

On a ZeroGPU Space: musicgen-medium, tapes up to 90s (chunked) - unchanged. Without a GPU: fall back to musicgen-small and a single 30s shot (no chunking), since medium + chunking on CPU would take minutes. Both the model default and ALLOWED_SECONDS now branch on IS_ZEROGPU (env still overrides the model). New /api/config exposes the allowed lengths; the length slider fetches it and collapses to a single 0:30 when that's all the backend offers - defensive, so any fetch failure keeps the 30/60/90 default and the GPU path is untouched.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Files changed (2) hide show

app.py +24 -6
frontend/ui.js +19 -6

app.py CHANGED Viewed

@@ -9,10 +9,16 @@ loops a background bed (waves, crackle, rain…) underneath. MusicGen ignores
 texture words in prompts, hence the separate bed. The enrichment LLM is
 MiniCPM (on cuda) on a ZeroGPU Space, or a local Ollama daemon in dev.
 Env knobs:
   LOFINITY_ENGINE   musicgen (default) | stub
   LOFINITY_DURATION clip length in seconds (default 30, the single-shot max)
   LOFINITY_DEVICE   cuda | mps | cpu (default: cuda on ZeroGPU, else mps if available)
   LOFINITY_ENRICHER MiniCPM model id for ZeroGPU enrichment (default MiniCPM5-1B)
   OLLAMA_URL        default http://localhost:11434  (local enrichment)
   OLLAMA_MODEL      default llama3.2:3b              (local enrichment)
@@ -70,10 +76,14 @@ print(
 )
 ENGINE = os.getenv("LOFINITY_ENGINE", "musicgen")
-# musicgen-medium continues from an audio seed more cleanly than -small, whose
-# continuations degrade into noise after a chunk or two. Override via the env to
-# fall back to facebook/musicgen-small (faster, smaller) if needed.
-MUSICGEN_MODEL = os.getenv("LOFINITY_MUSICGEN", "facebook/musicgen-medium")
 # 30s is musicgen-small's single-shot max (1500 tokens). Longer tapes are
 # stitched from 30s chunks: each one re-seeds the model with the last OVERLAP_S
 # of the track so it keeps playing from there. musicgen-small's context is 2048
@@ -85,8 +95,9 @@ OVERLAP_S = float(os.getenv("LOFINITY_OVERLAP_S", "2"))  # seconds of tail fed b
 # total output (seed + new) at MAX_GEN_S to stay inside that window. Env-tunable.
 MAX_GEN_S = float(os.getenv("LOFINITY_MAX_GEN_S", "28"))
 SEAM_S = 0.4  # equal-power crossfade at each stitch, to hide the join
-# the lengths the tape-length slider offers; the API snaps any value to one
-ALLOWED_SECONDS = (30, 60, 90)
 DEFAULT_SECONDS = int(os.getenv("LOFINITY_DURATION", "30"))
 OLLAMA_URL = os.getenv("OLLAMA_URL", "http://localhost:11434")
 OLLAMA_MODEL = os.getenv("OLLAMA_MODEL", "llama3.2:3b")
@@ -539,6 +550,13 @@ def progress() -> dict:
     return dict(_PROGRESS)
 @app.get("/")
 async def homepage():
     return FileResponse(FRONTEND / "index.html")

 texture words in prompts, hence the separate bed. The enrichment LLM is
 MiniCPM (on cuda) on a ZeroGPU Space, or a local Ollama daemon in dev.
+On a ZeroGPU Space it runs musicgen-medium and allows tapes up to 90s (chunked);
+without a GPU it falls back to musicgen-small and a single 30s shot (no chunking).
 Env knobs:
   LOFINITY_ENGINE   musicgen (default) | stub
   LOFINITY_DURATION clip length in seconds (default 30, the single-shot max)
   LOFINITY_DEVICE   cuda | mps | cpu (default: cuda on ZeroGPU, else mps if available)
+  LOFINITY_MUSICGEN model id (default: musicgen-medium on ZeroGPU, else musicgen-small)
+  LOFINITY_OVERLAP_S continuation seed length, seconds (default 2)
+  LOFINITY_MAX_GEN_S cap on a continuation's total output, seconds (default 28)
   LOFINITY_ENRICHER MiniCPM model id for ZeroGPU enrichment (default MiniCPM5-1B)
   OLLAMA_URL        default http://localhost:11434  (local enrichment)
   OLLAMA_MODEL      default llama3.2:3b              (local enrichment)
 )
 ENGINE = os.getenv("LOFINITY_ENGINE", "musicgen")
+# Model + tape length scale with the hardware: a ZeroGPU Space gets the bigger,
+# cleaner-continuing musicgen-medium and full chunked tapes (up to 90s); without a
+# GPU we fall back to the smaller, faster musicgen-small and a single 30s shot
+# (medium + chunking on CPU would take minutes). The env var still overrides.
+MUSICGEN_MODEL = os.getenv(
+    "LOFINITY_MUSICGEN",
+    "facebook/musicgen-medium" if IS_ZEROGPU else "facebook/musicgen-small",
+)
 # 30s is musicgen-small's single-shot max (1500 tokens). Longer tapes are
 # stitched from 30s chunks: each one re-seeds the model with the last OVERLAP_S
 # of the track so it keeps playing from there. musicgen-small's context is 2048
 # total output (seed + new) at MAX_GEN_S to stay inside that window. Env-tunable.
 MAX_GEN_S = float(os.getenv("LOFINITY_MAX_GEN_S", "28"))
 SEAM_S = 0.4  # equal-power crossfade at each stitch, to hide the join
+# the tape lengths the API allows (it snaps any request to the nearest). Only a
+# GPU gets the longer, chunked tapes; a CPU-only fallback is capped to one 30s shot.
+ALLOWED_SECONDS = (30, 60, 90) if IS_ZEROGPU else (30,)
 DEFAULT_SECONDS = int(os.getenv("LOFINITY_DURATION", "30"))
 OLLAMA_URL = os.getenv("OLLAMA_URL", "http://localhost:11434")
 OLLAMA_MODEL = os.getenv("OLLAMA_MODEL", "llama3.2:3b")
     return dict(_PROGRESS)
+@app.get("/api/config")
+def config() -> dict:
+    """Frontend config: the tape lengths this backend allows. Hardware-dependent —
+    a CPU-only fallback offers only 30s — so the slider reads it and adapts."""
+    return {"allowed_seconds": list(ALLOWED_SECONDS)}
 @app.get("/")
 async def homepage():
     return FileResponse(FRONTEND / "index.html")

frontend/ui.js CHANGED Viewed

@@ -25,13 +25,26 @@ export function initUI({
   const coinBtn = $("coin-button");
   // slider stops → (seconds sent to the backend, label on the screen). 1 min and
-  // 1.5 min are stitched from 30s chunks (the backend continues from the last 6s).
-  const LENGTHS = [
-    { seconds: 30, label: "0:30" },
-    { seconds: 60, label: "1:00" },
-    { seconds: 90, label: "1:30" },
-  ];
   const selectedLength = () => LENGTHS[Number(lengthSlider.value)] ?? LENGTHS[0];
   const controlsRow = $("controls-row");
   const generating = $("generating");
   const brewFill = $("brew-bar-fill");

   const coinBtn = $("coin-button");
   // slider stops → (seconds sent to the backend, label on the screen). 1 min and
+  // 1.5 min are stitched from 30s chunks. The set is hardware-dependent — a
+  // CPU-only backend allows only 30s — so we fetch the real list from /api/config
+  // and collapse the slider when there's a single option.
+  const fmtLen = (s) => `${Math.floor(s / 60)}:${String(s % 60).padStart(2, "0")}`;
+  let LENGTHS = [30, 60, 90].map((s) => ({ seconds: s, label: fmtLen(s) }));
   const selectedLength = () => LENGTHS[Number(lengthSlider.value)] ?? LENGTHS[0];
+  // adapt the slider to what this backend actually allows; defensive — any failure
+  // keeps the 30/60/90 default, so the GPU path is never affected
+  fetch("/api/config")
+    .then((r) => (r.ok ? r.json() : null))
+    .then((cfg) => {
+      const allowed = cfg && Array.isArray(cfg.allowed_seconds) ? cfg.allowed_seconds : null;
+      if (!allowed || !allowed.length) return;
+      LENGTHS = allowed.map((s) => ({ seconds: s, label: fmtLen(s) }));
+      lengthSlider.max = String(Math.max(0, LENGTHS.length - 1));
+      if (Number(lengthSlider.value) > LENGTHS.length - 1) lengthSlider.value = "0";
+      lengthValue.textContent = selectedLength().label;
+      if (LENGTHS.length <= 1) lengthRow.style.display = "none"; // single option → no slider
+    })
+    .catch(() => {});
   const controlsRow = $("controls-row");
   const generating = $("generating");
   const brewFill = $("brew-bar-fill");