LoFinity / README.md
eloigil6's picture
Update README.md
f48aa43 verified
|
Raw
History Blame Contribute Delete
14.7 kB
---
title: LoFinity
emoji: 🌍
colorFrom: yellow
colorTo: green
sdk: gradio
sdk_version: 6.17.3
python_version: "3.12.12"
app_file: app.py
pinned: false
license: mit
short_description: A vending machine app that generates endless lofi beats
thumbnail: https://build-small-hackathon-lofinity.hf.space/static/og.png
tags:
- thousand-token-wood
- community-choice
- off-the-grid
- off-brand
- tiny-titan
- field-notes
- best-demo
- best-agent
- bonus-quest-champion
- judges-wildcard
- minicpm
- track:wood
- sponsor:openbmb
- achievement:offgrid
- achievement:offbrand
- achievement:fieldnotes
---
# LoFinity 🎧
_Chill beats, freshly vended: a vending machine that generates endless lofi, built for the [Build Small Hackathon](https://build-small-hackathon-field-guide.hf.space/)._
## 🏅 Badges I'm going for
LoFinity is my entry for the Build Small Hackathon. Here is the track and the badges I am submitting for:
- 🌳 **Thousand Token Wood** (the whimsical track) + **Community Choice**: LoFinity is pure cozy whimsy.
- 🔌 **Off the Grid**: no cloud APIs. Every model (MiniCPM5-1B + MusicGen) runs on the Space's own GPU, or locally. Nothing phones home.
- 🎨 **Off-Brand**: the UI is a fully custom Three.js world, miles past the default Gradio components.
- 🐣 **Tiny Titan**: every model I ship is ≤4B (MiniCPM5-1B ~1B + MusicGen-medium ~1.5B).
- 🧩 **MiniCPM sponsor prize**: OpenBMB's MiniCPM5-1B is the brain that plans every single song.
- 📓 **Field Notes**: a write-up of the build and what I learned (this README, plus a longer blog post).
- 🎬 **Best Demo**: once my demo video and social post are up (that is literally next on my list).
- 🤖 **Best Agent**: the multi-model orchestration, a small LLM planning, an audio model performing, an ambience layer dressing the set. More pipeline than autonomous agent, but the multi-step collaboration is real.
- 🏆 **Bonus Quest Champion**: stacking as many bonus criteria as I honestly can.
- 🃏 **Judges' Wildcard**: well... a 3D lofi vending machine is nothing if not a wildcard.
**[Live demo](https://huggingface.co/spaces/build-small-hackathon/LoFinity)** · 🎬 Demo video: [YouTube](https://youtu.be/nrIU3Cwnijk) · 🐦 Social post: [dev.to](https://dev.to/eloigil/lofinity-chill-beats-freshly-vended-4ml2)
![LoFinity](https://build-small-hackathon-lofinity.hf.space/static/og.png)
**LoFinity is a vending machine for lofi.** You land in a cozy, low-poly, anime-ish little street, you walk up to the machine, you insert a coin, you type a vibe (_"studying late in a snowy cabin"_), and out pops a cassette tape with a freshly generated song. Everything chill and pleasing, without triggering your dopamine.
### The story behind it
I built this whole thing while on parental leave, with a toddler who never stops and a baby who is just figuring out the world. People assume parental leave is rest. It is not. It is beautiful, it is loud, and it is a little bit of a pandemonium. LoFinity became my small escape: one hour here, twenty minutes there, always between nap times, building something that is _mine_, piece by piece.
The idea is over a year old. I _love_ lofi music, and not only because it sounds nice. I am neurodivergent, and focusing is not always easy for me. Those warm, repetitive, slightly imperfect beats are the thing that finally lets my brain settle down and work, with a hit of 90s childhood nostalgia on top. So a machine that vends endless lofi felt almost personal, like building a tool for my own brain.
I had the _vision_ very clearly, but I was not comfortable with Three.js. Then Anthropic dropped **Fable 5** and I just HAD to try it. It took me from "I have this in my head" to a real, living 3D world. It worked beautifully... right until it got banned, but hey, shit happens. 🤷 I am grateful for the 3 days, enough to get me kickstarted.
## How it works
LoFinity is a **Gradio Server** app (`gradio.server.Server`) that serves a hand-built **Three.js** frontend and exposes a tiny generation API. Every tape is made by a short chain of small, open models, and on the live Space the whole chain runs on **ZeroGPU**.
```
your vibe
enrich ──► MiniCPM5-1B (ZeroGPU) or Ollama llama3.2:3b (local)
│ → music_prompt + cassette title + ambience tag (strict JSON)
render ──► MusicGen (medium on GPU / small on CPU)
│ → 30s shots, stitched with overlap-seeded continuation for longer tapes
dress ──► ambience.py mixes a looped bed (rain / waves / crackle / …) under the music
inline base64 WAV ──► browser turns it into a Blob URL, collection stays client-side
```
### The generation pipeline
1. **You type a vibe** and pick a length (30 / 60 / 90s on GPU).
2. **A small LLM enriches it.** On the Space that is **MiniCPM5-1B** (OpenBMB, ~1B params); locally it is **Ollama** running `llama3.2:3b`. It returns strict JSON: a MusicGen `music_prompt` (genre + 2-3 vibe-matched instruments + mood + tempo), a cassette `title`, and an `ambience` tag. Thinking mode is off, the output is templated and few-shot-guided, and "lofi" is force-prefixed if the model drifts.
3. **MusicGen renders the music.** `musicgen-medium` (~1.5B) on GPU, `musicgen-small` (~300M) on CPU. A single shot is ~30s, which is its training window.
4. **Longer tapes are stitched.** To go past 30s, the last `OVERLAP_S` (2s) of audio is fed back as a seed and the model continues. Each continuation is capped at `MAX_GEN_S` (28s) total so it never runs past the ~30s window (going past it is what turns the tail into noise). Chunks are RMS-matched (continuations drift quieter) and joined with a 0.4s equal-power crossfade.
5. **Ambience is mixed in.** A separate bed (rain, ocean, crickets, café murmur, fireplace, birdsong, wind, or procedural vinyl-crackle / tape-hiss) is looped and mixed gently under the music in `ambience.py`, because MusicGen ignores texture words in the prompt.
6. **The tape ships inline.** The WAV comes back as a base64 data URI, so no file is ever written to disk (nothing is cached or shared between visitors on the Space). The browser turns it into a Blob URL, and the collection lives client-side, per session.
### Running it all on ZeroGPU
This is the part I am most proud of: two open models, orchestrated together, both comfortably small, all on ZeroGPU.
- **One acquisition per vend.** Enrichment (MiniCPM) and music (MusicGen) both run inside a single `@spaces.GPU` call (`gpu_brew`), with a dynamic duration budget (`40 + 40 * chunks` seconds). A brew that overruns its budget gets killed mid-render, so the budget is generous.
- **Models load at import time.** They are placed on `cuda` at module load, which is the documented ZeroGPU pattern: a CUDA-emulation layer makes `.to('cuda')` work before a GPU is attached, and startup placement beats per-call transfers.
- **Detection is honest.** ZeroGPU is detected via the `spaces` library's own `Config.zero_gpu` flag, not by string-matching `SPACES_ZERO_GPU`. (That bit me: the runtime sets it to `'1'`, not `'true'`, so my exact-string check silently ran everything on CPU for a while.)
- **Progress is estimated.** The GPU worker is a separate process and cannot push real per-chunk progress, so `/api/progress` returns a smooth time-based estimate for the brewing bar.
- **Hardware-adaptive.** GPU: musicgen-medium + chunked tapes up to 90s. No GPU: musicgen-small + a single 30s shot (medium + chunking on CPU is too slow). The frontend reads `/api/config` and adapts the length slider.
- **Identical local code.** Locally, `spaces` is shimmed to a no-op decorator, so the exact same code runs on MPS / CPU untouched.
### The frontend (all hand-built, no default Gradio components)
- A cozy, low-poly, anime-styled street scene in **Three.js**: the vending machine, a bench, a lamp post, layered mountains, a forest, a day/night toggle (persisted), and a little Game Boy on the sidewalk.
- A camera state machine drives the intro descent, the zoom into the machine, and the cassette flow.
- The cassette **collection** is a coverflow carousel with an equal-power crossfade playlist between tapes.
- The Game Boy runs a tiny, no-score **garden mini-game** (`garden.js`) to play while a tape brews.
- A café-jazz **lobby bed** plays when idle, plus a global mute toggle.
- Perf: static geometry is merged (from 462 down to 207 draw calls), shadow maps are baked once, a frame governor runs 30fps idle / 60fps during transitions, and hover outlines use a screen-space edge-detect pass.
### API
| Endpoint | What it does |
| -------------------------------- | ---------------------------------------------------------------------------------------------- |
| `generate_song(prompt, seconds)` | Gradio API. Returns `{title, audio}` (audio is an inline WAV data URI). `concurrency_limit=1`. |
| `GET /api/progress` | Brewing progress for the bar (real per-chunk locally, time-based estimate on GPU). |
| `GET /api/config` | `{allowed_seconds}`, so the length slider adapts to the hardware. |
| `GET /` | The Three.js app. `/static` serves the frontend assets. |
## Tech stack
- **Python** 3.12.12 (ZeroGPU pins it; 3.12+ locally)
- **Gradio** 6.17.3 (`gradio.server.Server`, FastAPI / Starlette underneath)
- **transformers** + **torch** ≥2.8: MusicGen + MiniCPM5-1B
- **Three.js** (via CDN + importmap)
- **Ollama** (local enrichment only)
- **ZeroGPU** (NVIDIA) on the Space
## Run it locally
```bash
# 1. clone
git clone <repo-url> && cd LoFinity
# 2. environment (Python 3.12+). gradio is the Space SDK, so install it explicitly here.
uv venv --python 3.12
uv pip install gradio==6.17.3 -r requirements.txt
# 3. (recommended) local enrichment LLM via Ollama
ollama pull llama3.2:3b # served at http://localhost:11434
# 4. run — the first vend downloads musicgen-small and takes a minute to warm up
.venv/bin/python app.py
# open http://localhost:7860
```
Locally there is no GPU, so it uses `musicgen-small` and 30s tapes. Without an Ollama daemon, enrichment falls back to a plain non-LLM path (blander titles and instruments) but everything still works.
**Quick UI work without the heavy model** (tones instead of MusicGen):
```bash
LOFINITY_ENGINE=stub .venv/bin/python app.py
```
### Environment knobs
| Variable | Default | What it does |
| ----------------------------- | --------------------------------- | ------------------------------------------------------------------- |
| `LOFINITY_ENGINE` | `musicgen` | `musicgen`, or `stub` for tones during UI dev |
| `LOFINITY_DEVICE` | auto | `cuda` / `mps` / `cpu` (auto: cuda on ZeroGPU, else mps, else cpu) |
| `LOFINITY_MUSICGEN` | auto | model id (auto: musicgen-medium on ZeroGPU, else musicgen-small) |
| `LOFINITY_DURATION` | `30` | default clip length in seconds |
| `LOFINITY_OVERLAP_S` | `2` | continuation seed length in seconds |
| `LOFINITY_MAX_GEN_S` | `28` | cap on a continuation's total output, to stay inside the 30s window |
| `LOFINITY_ENRICHER` | `openbmb/MiniCPM5-1B` | enrichment model id on ZeroGPU |
| `OLLAMA_URL` / `OLLAMA_MODEL` | `localhost:11434` / `llama3.2:3b` | local enrichment |
### Project layout
| Path | What's inside |
| ------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------- |
| `app.py` | Backend: the pipeline, the audio engines, the API |
| `ambience.py` | Ambience beds + mixing |
| `frontend/` | `index.html`, `main.js` (scene + camera), `world.js` (the 3D world), `ui.js` (modal + audio + collection), `garden.js` (mini-game), `style.css` |
| `assets/ambience/` | The looped beds + credits |
| `scripts/` | Dev tools (fetch / generate ambience, make the OG image) |
## What I learned
![One does not simply prompt-create a lofi song](https://build-small-hackathon-lofinity.hf.space/static/meme-lofi.png)
**30 seconds is a wall.** MusicGen is trained on ~30s clips, so anything longer has to be stitched from continuations, and naive stitching slowly drifts into noise. The fix was understanding _why_ (each continuation was generating past the 30s window) and capping every shot, instead of fighting the symptoms. A very humbling "go read how the model actually works" moment.
**One model cannot do everything, so orchestrate.** A music model is completely deaf to texture words like "rain" or "vinyl crackle." So instead of one big model, a small LLM plans the recipe, MusicGen performs it, and a separate ambience layer dresses the set. Small models plus smart orchestration beat one giant model trying to do it all.
**Constraints make you creative.** ZeroGPU's forked worker cannot report progress, so the brewing bar became a smooth time-based estimate. And yes, an exact-string check on `SPACES_ZERO_GPU` (which is `'1'`, not `'true'`) silently ran everything on CPU for a while. 8 years in the industry and still getting the classic humbling. ✌️😅
## Credits & license
- **Models:** [MusicGen](https://huggingface.co/facebook/musicgen-medium) (Meta), [MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B) (OpenBMB).
- **Ambience beds:** see `assets/ambience/CREDITS.md`. Lobby music: "Peaceful Cafe Jazz" by Alex Morgan (Pixabay, royalty-free).
- **License:** MIT.