Spaces:

build-small-hackathon
/

LoFinity

Running on Zero

App Files Files Community

LoFinity / README.md

eloigil6

Update README.md

f48aa43 verified 13 days ago

preview code

Raw

History Blame Contribute Delete

14.7 kB

	---
	title: LoFinity
	emoji: 🌍
	colorFrom: yellow
	colorTo: green
	sdk: gradio
	sdk_version: 6.17.3
	python_version: "3.12.12"
	app_file: app.py
	pinned: false
	license: mit
	short_description: A vending machine app that generates endless lofi beats
	thumbnail: https://build-small-hackathon-lofinity.hf.space/static/og.png
	tags:
	- thousand-token-wood
	- community-choice
	- off-the-grid
	- off-brand
	- tiny-titan
	- field-notes
	- best-demo
	- best-agent
	- bonus-quest-champion
	- judges-wildcard
	- minicpm
	- track:wood
	- sponsor:openbmb
	- achievement:offgrid
	- achievement:offbrand
	- achievement:fieldnotes
	---

	# LoFinity 🎧

	_Chill beats, freshly vended: a vending machine that generates endless lofi, built for the [Build Small Hackathon](https://build-small-hackathon-field-guide.hf.space/)._

	## 🏅 Badges I'm going for

	LoFinity is my entry for the Build Small Hackathon. Here is the track and the badges I am submitting for:

	- 🌳 Thousand Token Wood (the whimsical track) + Community Choice: LoFinity is pure cozy whimsy.
	- 🔌 Off the Grid: no cloud APIs. Every model (MiniCPM5-1B + MusicGen) runs on the Space's own GPU, or locally. Nothing phones home.
	- 🎨 Off-Brand: the UI is a fully custom Three.js world, miles past the default Gradio components.
	- 🐣 Tiny Titan: every model I ship is ≤4B (MiniCPM5-1B ~1B + MusicGen-medium ~1.5B).
	- 🧩 MiniCPM sponsor prize: OpenBMB's MiniCPM5-1B is the brain that plans every single song.
	- 📓 Field Notes: a write-up of the build and what I learned (this README, plus a longer blog post).
	- 🎬 Best Demo: once my demo video and social post are up (that is literally next on my list).
	- 🤖 Best Agent: the multi-model orchestration, a small LLM planning, an audio model performing, an ambience layer dressing the set. More pipeline than autonomous agent, but the multi-step collaboration is real.
	- 🏆 Bonus Quest Champion: stacking as many bonus criteria as I honestly can.
	- 🃏 Judges' Wildcard: well... a 3D lofi vending machine is nothing if not a wildcard.

	▶ [Live demo](https://huggingface.co/spaces/build-small-hackathon/LoFinity) · 🎬 Demo video: [YouTube](https://youtu.be/nrIU3Cwnijk) · 🐦 Social post: [dev.to](https://dev.to/eloigil/lofinity-chill-beats-freshly-vended-4ml2)

	![LoFinity](https://build-small-hackathon-lofinity.hf.space/static/og.png)

	LoFinity is a vending machine for lofi. You land in a cozy, low-poly, anime-ish little street, you walk up to the machine, you insert a coin, you type a vibe (_"studying late in a snowy cabin"_), and out pops a cassette tape with a freshly generated song. Everything chill and pleasing, without triggering your dopamine.

	### The story behind it

	I built this whole thing while on parental leave, with a toddler who never stops and a baby who is just figuring out the world. People assume parental leave is rest. It is not. It is beautiful, it is loud, and it is a little bit of a pandemonium. LoFinity became my small escape: one hour here, twenty minutes there, always between nap times, building something that is _mine_, piece by piece.

	The idea is over a year old. I _love_ lofi music, and not only because it sounds nice. I am neurodivergent, and focusing is not always easy for me. Those warm, repetitive, slightly imperfect beats are the thing that finally lets my brain settle down and work, with a hit of 90s childhood nostalgia on top. So a machine that vends endless lofi felt almost personal, like building a tool for my own brain.

	I had the _vision_ very clearly, but I was not comfortable with Three.js. Then Anthropic dropped Fable 5 and I just HAD to try it. It took me from "I have this in my head" to a real, living 3D world. It worked beautifully... right until it got banned, but hey, shit happens. 🤷 I am grateful for the 3 days, enough to get me kickstarted.

	## How it works

	LoFinity is a Gradio Server app (`gradio.server.Server`) that serves a hand-built Three.js frontend and exposes a tiny generation API. Every tape is made by a short chain of small, open models, and on the live Space the whole chain runs on ZeroGPU.

	```
	your vibe
	│
	▼
	enrich ──► MiniCPM5-1B (ZeroGPU) or Ollama llama3.2:3b (local)
	│ → music_prompt + cassette title + ambience tag (strict JSON)
	▼
	render ──► MusicGen (medium on GPU / small on CPU)
	│ → 30s shots, stitched with overlap-seeded continuation for longer tapes
	▼
	dress ──► ambience.py mixes a looped bed (rain / waves / crackle / …) under the music
	│
	▼
	inline base64 WAV ──► browser turns it into a Blob URL, collection stays client-side
	```

	### The generation pipeline

	1. You type a vibe and pick a length (30 / 60 / 90s on GPU).
	2. A small LLM enriches it. On the Space that is MiniCPM5-1B (OpenBMB, ~1B params); locally it is Ollama running `llama3.2:3b`. It returns strict JSON: a MusicGen `music_prompt` (genre + 2-3 vibe-matched instruments + mood + tempo), a cassette `title`, and an `ambience` tag. Thinking mode is off, the output is templated and few-shot-guided, and "lofi" is force-prefixed if the model drifts.
	3. MusicGen renders the music. `musicgen-medium` (~1.5B) on GPU, `musicgen-small` (~300M) on CPU. A single shot is ~30s, which is its training window.
	4. Longer tapes are stitched. To go past 30s, the last `OVERLAP_S` (2s) of audio is fed back as a seed and the model continues. Each continuation is capped at `MAX_GEN_S` (28s) total so it never runs past the ~30s window (going past it is what turns the tail into noise). Chunks are RMS-matched (continuations drift quieter) and joined with a 0.4s equal-power crossfade.
	5. Ambience is mixed in. A separate bed (rain, ocean, crickets, café murmur, fireplace, birdsong, wind, or procedural vinyl-crackle / tape-hiss) is looped and mixed gently under the music in `ambience.py`, because MusicGen ignores texture words in the prompt.
	6. The tape ships inline. The WAV comes back as a base64 data URI, so no file is ever written to disk (nothing is cached or shared between visitors on the Space). The browser turns it into a Blob URL, and the collection lives client-side, per session.

	### Running it all on ZeroGPU

	This is the part I am most proud of: two open models, orchestrated together, both comfortably small, all on ZeroGPU.

	- One acquisition per vend. Enrichment (MiniCPM) and music (MusicGen) both run inside a single `@spaces.GPU` call (`gpu_brew`), with a dynamic duration budget (`40 + 40 * chunks` seconds). A brew that overruns its budget gets killed mid-render, so the budget is generous.
	- Models load at import time. They are placed on `cuda` at module load, which is the documented ZeroGPU pattern: a CUDA-emulation layer makes `.to('cuda')` work before a GPU is attached, and startup placement beats per-call transfers.
	- Detection is honest. ZeroGPU is detected via the `spaces` library's own `Config.zero_gpu` flag, not by string-matching `SPACES_ZERO_GPU`. (That bit me: the runtime sets it to `'1'`, not `'true'`, so my exact-string check silently ran everything on CPU for a while.)
	- Progress is estimated. The GPU worker is a separate process and cannot push real per-chunk progress, so `/api/progress` returns a smooth time-based estimate for the brewing bar.
	- Hardware-adaptive. GPU: musicgen-medium + chunked tapes up to 90s. No GPU: musicgen-small + a single 30s shot (medium + chunking on CPU is too slow). The frontend reads `/api/config` and adapts the length slider.
	- Identical local code. Locally, `spaces` is shimmed to a no-op decorator, so the exact same code runs on MPS / CPU untouched.

	### The frontend (all hand-built, no default Gradio components)

	- A cozy, low-poly, anime-styled street scene in Three.js: the vending machine, a bench, a lamp post, layered mountains, a forest, a day/night toggle (persisted), and a little Game Boy on the sidewalk.
	- A camera state machine drives the intro descent, the zoom into the machine, and the cassette flow.
	- The cassette collection is a coverflow carousel with an equal-power crossfade playlist between tapes.
	- The Game Boy runs a tiny, no-score garden mini-game (`garden.js`) to play while a tape brews.
	- A café-jazz lobby bed plays when idle, plus a global mute toggle.
	- Perf: static geometry is merged (from 462 down to 207 draw calls), shadow maps are baked once, a frame governor runs 30fps idle / 60fps during transitions, and hover outlines use a screen-space edge-detect pass.

	### API

	\| Endpoint \| What it does \|
	\| -------------------------------- \| ---------------------------------------------------------------------------------------------- \|
	\| `generate_song(prompt, seconds)` \| Gradio API. Returns `{title, audio}` (audio is an inline WAV data URI). `concurrency_limit=1`. \|
	\| `GET /api/progress` \| Brewing progress for the bar (real per-chunk locally, time-based estimate on GPU). \|
	\| `GET /api/config` \| `{allowed_seconds}`, so the length slider adapts to the hardware. \|
	\| `GET /` \| The Three.js app. `/static` serves the frontend assets. \|

	## Tech stack

	- Python 3.12.12 (ZeroGPU pins it; 3.12+ locally)
	- Gradio 6.17.3 (`gradio.server.Server`, FastAPI / Starlette underneath)
	- transformers + torch ≥2.8: MusicGen + MiniCPM5-1B
	- Three.js (via CDN + importmap)
	- Ollama (local enrichment only)
	- ZeroGPU (NVIDIA) on the Space

	## Run it locally

	```bash
	# 1. clone
	git clone <repo-url> && cd LoFinity

	# 2. environment (Python 3.12+). gradio is the Space SDK, so install it explicitly here.
	uv venv --python 3.12
	uv pip install gradio==6.17.3 -r requirements.txt

	# 3. (recommended) local enrichment LLM via Ollama
	ollama pull llama3.2:3b # served at http://localhost:11434

	# 4. run — the first vend downloads musicgen-small and takes a minute to warm up
	.venv/bin/python app.py
	# open http://localhost:7860
	```

	Locally there is no GPU, so it uses `musicgen-small` and 30s tapes. Without an Ollama daemon, enrichment falls back to a plain non-LLM path (blander titles and instruments) but everything still works.

	Quick UI work without the heavy model (tones instead of MusicGen):

	```bash
	LOFINITY_ENGINE=stub .venv/bin/python app.py
	```

	### Environment knobs

	\| Variable \| Default \| What it does \|
	\| ----------------------------- \| --------------------------------- \| ------------------------------------------------------------------- \|
	\| `LOFINITY_ENGINE` \| `musicgen` \| `musicgen`, or `stub` for tones during UI dev \|
	\| `LOFINITY_DEVICE` \| auto \| `cuda` / `mps` / `cpu` (auto: cuda on ZeroGPU, else mps, else cpu) \|
	\| `LOFINITY_MUSICGEN` \| auto \| model id (auto: musicgen-medium on ZeroGPU, else musicgen-small) \|
	\| `LOFINITY_DURATION` \| `30` \| default clip length in seconds \|
	\| `LOFINITY_OVERLAP_S` \| `2` \| continuation seed length in seconds \|
	\| `LOFINITY_MAX_GEN_S` \| `28` \| cap on a continuation's total output, to stay inside the 30s window \|
	\| `LOFINITY_ENRICHER` \| `openbmb/MiniCPM5-1B` \| enrichment model id on ZeroGPU \|
	\| `OLLAMA_URL` / `OLLAMA_MODEL` \| `localhost:11434` / `llama3.2:3b` \| local enrichment \|

	### Project layout

	\| Path \| What's inside \|
	\| ------------------ \| ----------------------------------------------------------------------------------------------------------------------------------------------- \|
	\| `app.py` \| Backend: the pipeline, the audio engines, the API \|
	\| `ambience.py` \| Ambience beds + mixing \|
	\| `frontend/` \| `index.html`, `main.js` (scene + camera), `world.js` (the 3D world), `ui.js` (modal + audio + collection), `garden.js` (mini-game), `style.css` \|
	\| `assets/ambience/` \| The looped beds + credits \|
	\| `scripts/` \| Dev tools (fetch / generate ambience, make the OG image) \|

	## What I learned

	![One does not simply prompt-create a lofi song](https://build-small-hackathon-lofinity.hf.space/static/meme-lofi.png)

	30 seconds is a wall. MusicGen is trained on ~30s clips, so anything longer has to be stitched from continuations, and naive stitching slowly drifts into noise. The fix was understanding _why_ (each continuation was generating past the 30s window) and capping every shot, instead of fighting the symptoms. A very humbling "go read how the model actually works" moment.

	One model cannot do everything, so orchestrate. A music model is completely deaf to texture words like "rain" or "vinyl crackle." So instead of one big model, a small LLM plans the recipe, MusicGen performs it, and a separate ambience layer dresses the set. Small models plus smart orchestration beat one giant model trying to do it all.

	Constraints make you creative. ZeroGPU's forked worker cannot report progress, so the brewing bar became a smooth time-based estimate. And yes, an exact-string check on `SPACES_ZERO_GPU` (which is `'1'`, not `'true'`) silently ran everything on CPU for a while. 8 years in the industry and still getting the classic humbling. ✌️😅

	## Credits & license

	- Models: [MusicGen](https://huggingface.co/facebook/musicgen-medium) (Meta), [MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B) (OpenBMB).
	- Ambience beds: see `assets/ambience/CREDITS.md`. Lobby music: "Peaceful Cafe Jazz" by Alex Morgan (Pixabay, royalty-free).
	- License: MIT.