Spaces:
Running on Zero
Running on Zero
| title: LoFinity | |
| emoji: 🌍 | |
| colorFrom: yellow | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: 6.17.3 | |
| python_version: "3.12.12" | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: A vending machine app that generates endless lofi beats | |
| thumbnail: https://build-small-hackathon-lofinity.hf.space/static/og.png | |
| tags: | |
| - thousand-token-wood | |
| - community-choice | |
| - off-the-grid | |
| - off-brand | |
| - tiny-titan | |
| - field-notes | |
| - best-demo | |
| - best-agent | |
| - bonus-quest-champion | |
| - judges-wildcard | |
| - minicpm | |
| - track:wood | |
| - sponsor:openbmb | |
| - achievement:offgrid | |
| - achievement:offbrand | |
| - achievement:fieldnotes | |
| # LoFinity 🎧 | |
| _Chill beats, freshly vended: a vending machine that generates endless lofi, built for the [Build Small Hackathon](https://build-small-hackathon-field-guide.hf.space/)._ | |
| ## 🏅 Badges I'm going for | |
| LoFinity is my entry for the Build Small Hackathon. Here is the track and the badges I am submitting for: | |
| - 🌳 **Thousand Token Wood** (the whimsical track) + **Community Choice**: LoFinity is pure cozy whimsy. | |
| - 🔌 **Off the Grid**: no cloud APIs. Every model (MiniCPM5-1B + MusicGen) runs on the Space's own GPU, or locally. Nothing phones home. | |
| - 🎨 **Off-Brand**: the UI is a fully custom Three.js world, miles past the default Gradio components. | |
| - 🐣 **Tiny Titan**: every model I ship is ≤4B (MiniCPM5-1B ~1B + MusicGen-medium ~1.5B). | |
| - 🧩 **MiniCPM sponsor prize**: OpenBMB's MiniCPM5-1B is the brain that plans every single song. | |
| - 📓 **Field Notes**: a write-up of the build and what I learned (this README, plus a longer blog post). | |
| - 🎬 **Best Demo**: once my demo video and social post are up (that is literally next on my list). | |
| - 🤖 **Best Agent**: the multi-model orchestration, a small LLM planning, an audio model performing, an ambience layer dressing the set. More pipeline than autonomous agent, but the multi-step collaboration is real. | |
| - 🏆 **Bonus Quest Champion**: stacking as many bonus criteria as I honestly can. | |
| - 🃏 **Judges' Wildcard**: well... a 3D lofi vending machine is nothing if not a wildcard. | |
| ▶ **[Live demo](https://huggingface.co/spaces/build-small-hackathon/LoFinity)** · 🎬 Demo video: [YouTube](https://youtu.be/nrIU3Cwnijk) · 🐦 Social post: [dev.to](https://dev.to/eloigil/lofinity-chill-beats-freshly-vended-4ml2) | |
|  | |
| **LoFinity is a vending machine for lofi.** You land in a cozy, low-poly, anime-ish little street, you walk up to the machine, you insert a coin, you type a vibe (_"studying late in a snowy cabin"_), and out pops a cassette tape with a freshly generated song. Everything chill and pleasing, without triggering your dopamine. | |
| ### The story behind it | |
| I built this whole thing while on parental leave, with a toddler who never stops and a baby who is just figuring out the world. People assume parental leave is rest. It is not. It is beautiful, it is loud, and it is a little bit of a pandemonium. LoFinity became my small escape: one hour here, twenty minutes there, always between nap times, building something that is _mine_, piece by piece. | |
| The idea is over a year old. I _love_ lofi music, and not only because it sounds nice. I am neurodivergent, and focusing is not always easy for me. Those warm, repetitive, slightly imperfect beats are the thing that finally lets my brain settle down and work, with a hit of 90s childhood nostalgia on top. So a machine that vends endless lofi felt almost personal, like building a tool for my own brain. | |
| I had the _vision_ very clearly, but I was not comfortable with Three.js. Then Anthropic dropped **Fable 5** and I just HAD to try it. It took me from "I have this in my head" to a real, living 3D world. It worked beautifully... right until it got banned, but hey, shit happens. 🤷 I am grateful for the 3 days, enough to get me kickstarted. | |
| ## How it works | |
| LoFinity is a **Gradio Server** app (`gradio.server.Server`) that serves a hand-built **Three.js** frontend and exposes a tiny generation API. Every tape is made by a short chain of small, open models, and on the live Space the whole chain runs on **ZeroGPU**. | |
| ``` | |
| your vibe | |
| │ | |
| ▼ | |
| enrich ──► MiniCPM5-1B (ZeroGPU) or Ollama llama3.2:3b (local) | |
| │ → music_prompt + cassette title + ambience tag (strict JSON) | |
| ▼ | |
| render ──► MusicGen (medium on GPU / small on CPU) | |
| │ → 30s shots, stitched with overlap-seeded continuation for longer tapes | |
| ▼ | |
| dress ──► ambience.py mixes a looped bed (rain / waves / crackle / …) under the music | |
| │ | |
| ▼ | |
| inline base64 WAV ──► browser turns it into a Blob URL, collection stays client-side | |
| ``` | |
| ### The generation pipeline | |
| 1. **You type a vibe** and pick a length (30 / 60 / 90s on GPU). | |
| 2. **A small LLM enriches it.** On the Space that is **MiniCPM5-1B** (OpenBMB, ~1B params); locally it is **Ollama** running `llama3.2:3b`. It returns strict JSON: a MusicGen `music_prompt` (genre + 2-3 vibe-matched instruments + mood + tempo), a cassette `title`, and an `ambience` tag. Thinking mode is off, the output is templated and few-shot-guided, and "lofi" is force-prefixed if the model drifts. | |
| 3. **MusicGen renders the music.** `musicgen-medium` (~1.5B) on GPU, `musicgen-small` (~300M) on CPU. A single shot is ~30s, which is its training window. | |
| 4. **Longer tapes are stitched.** To go past 30s, the last `OVERLAP_S` (2s) of audio is fed back as a seed and the model continues. Each continuation is capped at `MAX_GEN_S` (28s) total so it never runs past the ~30s window (going past it is what turns the tail into noise). Chunks are RMS-matched (continuations drift quieter) and joined with a 0.4s equal-power crossfade. | |
| 5. **Ambience is mixed in.** A separate bed (rain, ocean, crickets, café murmur, fireplace, birdsong, wind, or procedural vinyl-crackle / tape-hiss) is looped and mixed gently under the music in `ambience.py`, because MusicGen ignores texture words in the prompt. | |
| 6. **The tape ships inline.** The WAV comes back as a base64 data URI, so no file is ever written to disk (nothing is cached or shared between visitors on the Space). The browser turns it into a Blob URL, and the collection lives client-side, per session. | |
| ### Running it all on ZeroGPU | |
| This is the part I am most proud of: two open models, orchestrated together, both comfortably small, all on ZeroGPU. | |
| - **One acquisition per vend.** Enrichment (MiniCPM) and music (MusicGen) both run inside a single `@spaces.GPU` call (`gpu_brew`), with a dynamic duration budget (`40 + 40 * chunks` seconds). A brew that overruns its budget gets killed mid-render, so the budget is generous. | |
| - **Models load at import time.** They are placed on `cuda` at module load, which is the documented ZeroGPU pattern: a CUDA-emulation layer makes `.to('cuda')` work before a GPU is attached, and startup placement beats per-call transfers. | |
| - **Detection is honest.** ZeroGPU is detected via the `spaces` library's own `Config.zero_gpu` flag, not by string-matching `SPACES_ZERO_GPU`. (That bit me: the runtime sets it to `'1'`, not `'true'`, so my exact-string check silently ran everything on CPU for a while.) | |
| - **Progress is estimated.** The GPU worker is a separate process and cannot push real per-chunk progress, so `/api/progress` returns a smooth time-based estimate for the brewing bar. | |
| - **Hardware-adaptive.** GPU: musicgen-medium + chunked tapes up to 90s. No GPU: musicgen-small + a single 30s shot (medium + chunking on CPU is too slow). The frontend reads `/api/config` and adapts the length slider. | |
| - **Identical local code.** Locally, `spaces` is shimmed to a no-op decorator, so the exact same code runs on MPS / CPU untouched. | |
| ### The frontend (all hand-built, no default Gradio components) | |
| - A cozy, low-poly, anime-styled street scene in **Three.js**: the vending machine, a bench, a lamp post, layered mountains, a forest, a day/night toggle (persisted), and a little Game Boy on the sidewalk. | |
| - A camera state machine drives the intro descent, the zoom into the machine, and the cassette flow. | |
| - The cassette **collection** is a coverflow carousel with an equal-power crossfade playlist between tapes. | |
| - The Game Boy runs a tiny, no-score **garden mini-game** (`garden.js`) to play while a tape brews. | |
| - A café-jazz **lobby bed** plays when idle, plus a global mute toggle. | |
| - Perf: static geometry is merged (from 462 down to 207 draw calls), shadow maps are baked once, a frame governor runs 30fps idle / 60fps during transitions, and hover outlines use a screen-space edge-detect pass. | |
| ### API | |
| | Endpoint | What it does | | |
| | -------------------------------- | ---------------------------------------------------------------------------------------------- | | |
| | `generate_song(prompt, seconds)` | Gradio API. Returns `{title, audio}` (audio is an inline WAV data URI). `concurrency_limit=1`. | | |
| | `GET /api/progress` | Brewing progress for the bar (real per-chunk locally, time-based estimate on GPU). | | |
| | `GET /api/config` | `{allowed_seconds}`, so the length slider adapts to the hardware. | | |
| | `GET /` | The Three.js app. `/static` serves the frontend assets. | | |
| ## Tech stack | |
| - **Python** 3.12.12 (ZeroGPU pins it; 3.12+ locally) | |
| - **Gradio** 6.17.3 (`gradio.server.Server`, FastAPI / Starlette underneath) | |
| - **transformers** + **torch** ≥2.8: MusicGen + MiniCPM5-1B | |
| - **Three.js** (via CDN + importmap) | |
| - **Ollama** (local enrichment only) | |
| - **ZeroGPU** (NVIDIA) on the Space | |
| ## Run it locally | |
| ```bash | |
| # 1. clone | |
| git clone <repo-url> && cd LoFinity | |
| # 2. environment (Python 3.12+). gradio is the Space SDK, so install it explicitly here. | |
| uv venv --python 3.12 | |
| uv pip install gradio==6.17.3 -r requirements.txt | |
| # 3. (recommended) local enrichment LLM via Ollama | |
| ollama pull llama3.2:3b # served at http://localhost:11434 | |
| # 4. run — the first vend downloads musicgen-small and takes a minute to warm up | |
| .venv/bin/python app.py | |
| # open http://localhost:7860 | |
| ``` | |
| Locally there is no GPU, so it uses `musicgen-small` and 30s tapes. Without an Ollama daemon, enrichment falls back to a plain non-LLM path (blander titles and instruments) but everything still works. | |
| **Quick UI work without the heavy model** (tones instead of MusicGen): | |
| ```bash | |
| LOFINITY_ENGINE=stub .venv/bin/python app.py | |
| ``` | |
| ### Environment knobs | |
| | Variable | Default | What it does | | |
| | ----------------------------- | --------------------------------- | ------------------------------------------------------------------- | | |
| | `LOFINITY_ENGINE` | `musicgen` | `musicgen`, or `stub` for tones during UI dev | | |
| | `LOFINITY_DEVICE` | auto | `cuda` / `mps` / `cpu` (auto: cuda on ZeroGPU, else mps, else cpu) | | |
| | `LOFINITY_MUSICGEN` | auto | model id (auto: musicgen-medium on ZeroGPU, else musicgen-small) | | |
| | `LOFINITY_DURATION` | `30` | default clip length in seconds | | |
| | `LOFINITY_OVERLAP_S` | `2` | continuation seed length in seconds | | |
| | `LOFINITY_MAX_GEN_S` | `28` | cap on a continuation's total output, to stay inside the 30s window | | |
| | `LOFINITY_ENRICHER` | `openbmb/MiniCPM5-1B` | enrichment model id on ZeroGPU | | |
| | `OLLAMA_URL` / `OLLAMA_MODEL` | `localhost:11434` / `llama3.2:3b` | local enrichment | | |
| ### Project layout | |
| | Path | What's inside | | |
| | ------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------- | | |
| | `app.py` | Backend: the pipeline, the audio engines, the API | | |
| | `ambience.py` | Ambience beds + mixing | | |
| | `frontend/` | `index.html`, `main.js` (scene + camera), `world.js` (the 3D world), `ui.js` (modal + audio + collection), `garden.js` (mini-game), `style.css` | | |
| | `assets/ambience/` | The looped beds + credits | | |
| | `scripts/` | Dev tools (fetch / generate ambience, make the OG image) | | |
| ## What I learned | |
|  | |
| **30 seconds is a wall.** MusicGen is trained on ~30s clips, so anything longer has to be stitched from continuations, and naive stitching slowly drifts into noise. The fix was understanding _why_ (each continuation was generating past the 30s window) and capping every shot, instead of fighting the symptoms. A very humbling "go read how the model actually works" moment. | |
| **One model cannot do everything, so orchestrate.** A music model is completely deaf to texture words like "rain" or "vinyl crackle." So instead of one big model, a small LLM plans the recipe, MusicGen performs it, and a separate ambience layer dresses the set. Small models plus smart orchestration beat one giant model trying to do it all. | |
| **Constraints make you creative.** ZeroGPU's forked worker cannot report progress, so the brewing bar became a smooth time-based estimate. And yes, an exact-string check on `SPACES_ZERO_GPU` (which is `'1'`, not `'true'`) silently ran everything on CPU for a while. 8 years in the industry and still getting the classic humbling. ✌️😅 | |
| ## Credits & license | |
| - **Models:** [MusicGen](https://huggingface.co/facebook/musicgen-medium) (Meta), [MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B) (OpenBMB). | |
| - **Ambience beds:** see `assets/ambience/CREDITS.md`. Lobby music: "Peaceful Cafe Jazz" by Alex Morgan (Pixabay, royalty-free). | |
| - **License:** MIT. | |