--- title: LoFinity emoji: ๐ŸŒ colorFrom: yellow colorTo: green sdk: gradio sdk_version: 6.17.3 python_version: "3.12.12" app_file: app.py pinned: false license: mit short_description: A vending machine app that generates endless lofi beats thumbnail: https://build-small-hackathon-lofinity.hf.space/static/og.png tags: - thousand-token-wood - community-choice - off-the-grid - off-brand - tiny-titan - field-notes - best-demo - best-agent - bonus-quest-champion - judges-wildcard - minicpm - track:wood - sponsor:openbmb - achievement:offgrid - achievement:offbrand - achievement:fieldnotes --- # LoFinity ๐ŸŽง _Chill beats, freshly vended: a vending machine that generates endless lofi, built for the [Build Small Hackathon](https://build-small-hackathon-field-guide.hf.space/)._ ## ๐Ÿ… Badges I'm going for LoFinity is my entry for the Build Small Hackathon. Here is the track and the badges I am submitting for: - ๐ŸŒณ **Thousand Token Wood** (the whimsical track) + **Community Choice**: LoFinity is pure cozy whimsy. - ๐Ÿ”Œ **Off the Grid**: no cloud APIs. Every model (MiniCPM5-1B + MusicGen) runs on the Space's own GPU, or locally. Nothing phones home. - ๐ŸŽจ **Off-Brand**: the UI is a fully custom Three.js world, miles past the default Gradio components. - ๐Ÿฃ **Tiny Titan**: every model I ship is โ‰ค4B (MiniCPM5-1B ~1B + MusicGen-medium ~1.5B). - ๐Ÿงฉ **MiniCPM sponsor prize**: OpenBMB's MiniCPM5-1B is the brain that plans every single song. - ๐Ÿ““ **Field Notes**: a write-up of the build and what I learned (this README, plus a longer blog post). - ๐ŸŽฌ **Best Demo**: once my demo video and social post are up (that is literally next on my list). - ๐Ÿค– **Best Agent**: the multi-model orchestration, a small LLM planning, an audio model performing, an ambience layer dressing the set. More pipeline than autonomous agent, but the multi-step collaboration is real. - ๐Ÿ† **Bonus Quest Champion**: stacking as many bonus criteria as I honestly can. - ๐Ÿƒ **Judges' Wildcard**: well... a 3D lofi vending machine is nothing if not a wildcard. โ–ถ **[Live demo](https://huggingface.co/spaces/build-small-hackathon/LoFinity)** ยท ๐ŸŽฌ Demo video: [YouTube](https://youtu.be/nrIU3Cwnijk) ยท ๐Ÿฆ Social post: [dev.to](https://dev.to/eloigil/lofinity-chill-beats-freshly-vended-4ml2) ![LoFinity](https://build-small-hackathon-lofinity.hf.space/static/og.png) **LoFinity is a vending machine for lofi.** You land in a cozy, low-poly, anime-ish little street, you walk up to the machine, you insert a coin, you type a vibe (_"studying late in a snowy cabin"_), and out pops a cassette tape with a freshly generated song. Everything chill and pleasing, without triggering your dopamine. ### The story behind it I built this whole thing while on parental leave, with a toddler who never stops and a baby who is just figuring out the world. People assume parental leave is rest. It is not. It is beautiful, it is loud, and it is a little bit of a pandemonium. LoFinity became my small escape: one hour here, twenty minutes there, always between nap times, building something that is _mine_, piece by piece. The idea is over a year old. I _love_ lofi music, and not only because it sounds nice. I am neurodivergent, and focusing is not always easy for me. Those warm, repetitive, slightly imperfect beats are the thing that finally lets my brain settle down and work, with a hit of 90s childhood nostalgia on top. So a machine that vends endless lofi felt almost personal, like building a tool for my own brain. I had the _vision_ very clearly, but I was not comfortable with Three.js. Then Anthropic dropped **Fable 5** and I just HAD to try it. It took me from "I have this in my head" to a real, living 3D world. It worked beautifully... right until it got banned, but hey, shit happens. ๐Ÿคท I am grateful for the 3 days, enough to get me kickstarted. ## How it works LoFinity is a **Gradio Server** app (`gradio.server.Server`) that serves a hand-built **Three.js** frontend and exposes a tiny generation API. Every tape is made by a short chain of small, open models, and on the live Space the whole chain runs on **ZeroGPU**. ``` your vibe โ”‚ โ–ผ enrich โ”€โ”€โ–บ MiniCPM5-1B (ZeroGPU) or Ollama llama3.2:3b (local) โ”‚ โ†’ music_prompt + cassette title + ambience tag (strict JSON) โ–ผ render โ”€โ”€โ–บ MusicGen (medium on GPU / small on CPU) โ”‚ โ†’ 30s shots, stitched with overlap-seeded continuation for longer tapes โ–ผ dress โ”€โ”€โ–บ ambience.py mixes a looped bed (rain / waves / crackle / โ€ฆ) under the music โ”‚ โ–ผ inline base64 WAV โ”€โ”€โ–บ browser turns it into a Blob URL, collection stays client-side ``` ### The generation pipeline 1. **You type a vibe** and pick a length (30 / 60 / 90s on GPU). 2. **A small LLM enriches it.** On the Space that is **MiniCPM5-1B** (OpenBMB, ~1B params); locally it is **Ollama** running `llama3.2:3b`. It returns strict JSON: a MusicGen `music_prompt` (genre + 2-3 vibe-matched instruments + mood + tempo), a cassette `title`, and an `ambience` tag. Thinking mode is off, the output is templated and few-shot-guided, and "lofi" is force-prefixed if the model drifts. 3. **MusicGen renders the music.** `musicgen-medium` (~1.5B) on GPU, `musicgen-small` (~300M) on CPU. A single shot is ~30s, which is its training window. 4. **Longer tapes are stitched.** To go past 30s, the last `OVERLAP_S` (2s) of audio is fed back as a seed and the model continues. Each continuation is capped at `MAX_GEN_S` (28s) total so it never runs past the ~30s window (going past it is what turns the tail into noise). Chunks are RMS-matched (continuations drift quieter) and joined with a 0.4s equal-power crossfade. 5. **Ambience is mixed in.** A separate bed (rain, ocean, crickets, cafรฉ murmur, fireplace, birdsong, wind, or procedural vinyl-crackle / tape-hiss) is looped and mixed gently under the music in `ambience.py`, because MusicGen ignores texture words in the prompt. 6. **The tape ships inline.** The WAV comes back as a base64 data URI, so no file is ever written to disk (nothing is cached or shared between visitors on the Space). The browser turns it into a Blob URL, and the collection lives client-side, per session. ### Running it all on ZeroGPU This is the part I am most proud of: two open models, orchestrated together, both comfortably small, all on ZeroGPU. - **One acquisition per vend.** Enrichment (MiniCPM) and music (MusicGen) both run inside a single `@spaces.GPU` call (`gpu_brew`), with a dynamic duration budget (`40 + 40 * chunks` seconds). A brew that overruns its budget gets killed mid-render, so the budget is generous. - **Models load at import time.** They are placed on `cuda` at module load, which is the documented ZeroGPU pattern: a CUDA-emulation layer makes `.to('cuda')` work before a GPU is attached, and startup placement beats per-call transfers. - **Detection is honest.** ZeroGPU is detected via the `spaces` library's own `Config.zero_gpu` flag, not by string-matching `SPACES_ZERO_GPU`. (That bit me: the runtime sets it to `'1'`, not `'true'`, so my exact-string check silently ran everything on CPU for a while.) - **Progress is estimated.** The GPU worker is a separate process and cannot push real per-chunk progress, so `/api/progress` returns a smooth time-based estimate for the brewing bar. - **Hardware-adaptive.** GPU: musicgen-medium + chunked tapes up to 90s. No GPU: musicgen-small + a single 30s shot (medium + chunking on CPU is too slow). The frontend reads `/api/config` and adapts the length slider. - **Identical local code.** Locally, `spaces` is shimmed to a no-op decorator, so the exact same code runs on MPS / CPU untouched. ### The frontend (all hand-built, no default Gradio components) - A cozy, low-poly, anime-styled street scene in **Three.js**: the vending machine, a bench, a lamp post, layered mountains, a forest, a day/night toggle (persisted), and a little Game Boy on the sidewalk. - A camera state machine drives the intro descent, the zoom into the machine, and the cassette flow. - The cassette **collection** is a coverflow carousel with an equal-power crossfade playlist between tapes. - The Game Boy runs a tiny, no-score **garden mini-game** (`garden.js`) to play while a tape brews. - A cafรฉ-jazz **lobby bed** plays when idle, plus a global mute toggle. - Perf: static geometry is merged (from 462 down to 207 draw calls), shadow maps are baked once, a frame governor runs 30fps idle / 60fps during transitions, and hover outlines use a screen-space edge-detect pass. ### API | Endpoint | What it does | | -------------------------------- | ---------------------------------------------------------------------------------------------- | | `generate_song(prompt, seconds)` | Gradio API. Returns `{title, audio}` (audio is an inline WAV data URI). `concurrency_limit=1`. | | `GET /api/progress` | Brewing progress for the bar (real per-chunk locally, time-based estimate on GPU). | | `GET /api/config` | `{allowed_seconds}`, so the length slider adapts to the hardware. | | `GET /` | The Three.js app. `/static` serves the frontend assets. | ## Tech stack - **Python** 3.12.12 (ZeroGPU pins it; 3.12+ locally) - **Gradio** 6.17.3 (`gradio.server.Server`, FastAPI / Starlette underneath) - **transformers** + **torch** โ‰ฅ2.8: MusicGen + MiniCPM5-1B - **Three.js** (via CDN + importmap) - **Ollama** (local enrichment only) - **ZeroGPU** (NVIDIA) on the Space ## Run it locally ```bash # 1. clone git clone && cd LoFinity # 2. environment (Python 3.12+). gradio is the Space SDK, so install it explicitly here. uv venv --python 3.12 uv pip install gradio==6.17.3 -r requirements.txt # 3. (recommended) local enrichment LLM via Ollama ollama pull llama3.2:3b # served at http://localhost:11434 # 4. run โ€” the first vend downloads musicgen-small and takes a minute to warm up .venv/bin/python app.py # open http://localhost:7860 ``` Locally there is no GPU, so it uses `musicgen-small` and 30s tapes. Without an Ollama daemon, enrichment falls back to a plain non-LLM path (blander titles and instruments) but everything still works. **Quick UI work without the heavy model** (tones instead of MusicGen): ```bash LOFINITY_ENGINE=stub .venv/bin/python app.py ``` ### Environment knobs | Variable | Default | What it does | | ----------------------------- | --------------------------------- | ------------------------------------------------------------------- | | `LOFINITY_ENGINE` | `musicgen` | `musicgen`, or `stub` for tones during UI dev | | `LOFINITY_DEVICE` | auto | `cuda` / `mps` / `cpu` (auto: cuda on ZeroGPU, else mps, else cpu) | | `LOFINITY_MUSICGEN` | auto | model id (auto: musicgen-medium on ZeroGPU, else musicgen-small) | | `LOFINITY_DURATION` | `30` | default clip length in seconds | | `LOFINITY_OVERLAP_S` | `2` | continuation seed length in seconds | | `LOFINITY_MAX_GEN_S` | `28` | cap on a continuation's total output, to stay inside the 30s window | | `LOFINITY_ENRICHER` | `openbmb/MiniCPM5-1B` | enrichment model id on ZeroGPU | | `OLLAMA_URL` / `OLLAMA_MODEL` | `localhost:11434` / `llama3.2:3b` | local enrichment | ### Project layout | Path | What's inside | | ------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------- | | `app.py` | Backend: the pipeline, the audio engines, the API | | `ambience.py` | Ambience beds + mixing | | `frontend/` | `index.html`, `main.js` (scene + camera), `world.js` (the 3D world), `ui.js` (modal + audio + collection), `garden.js` (mini-game), `style.css` | | `assets/ambience/` | The looped beds + credits | | `scripts/` | Dev tools (fetch / generate ambience, make the OG image) | ## What I learned ![One does not simply prompt-create a lofi song](https://build-small-hackathon-lofinity.hf.space/static/meme-lofi.png) **30 seconds is a wall.** MusicGen is trained on ~30s clips, so anything longer has to be stitched from continuations, and naive stitching slowly drifts into noise. The fix was understanding _why_ (each continuation was generating past the 30s window) and capping every shot, instead of fighting the symptoms. A very humbling "go read how the model actually works" moment. **One model cannot do everything, so orchestrate.** A music model is completely deaf to texture words like "rain" or "vinyl crackle." So instead of one big model, a small LLM plans the recipe, MusicGen performs it, and a separate ambience layer dresses the set. Small models plus smart orchestration beat one giant model trying to do it all. **Constraints make you creative.** ZeroGPU's forked worker cannot report progress, so the brewing bar became a smooth time-based estimate. And yes, an exact-string check on `SPACES_ZERO_GPU` (which is `'1'`, not `'true'`) silently ran everything on CPU for a while. 8 years in the industry and still getting the classic humbling. โœŒ๏ธ๐Ÿ˜… ## Credits & license - **Models:** [MusicGen](https://huggingface.co/facebook/musicgen-medium) (Meta), [MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B) (OpenBMB). - **Ambience beds:** see `assets/ambience/CREDITS.md`. Lobby music: "Peaceful Cafe Jazz" by Alex Morgan (Pixabay, royalty-free). - **License:** MIT.