Instructions to use build-small-hackathon/lolaby-llama-3b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use build-small-hackathon/lolaby-llama-3b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="build-small-hackathon/lolaby-llama-3b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("build-small-hackathon/lolaby-llama-3b") model = AutoModelForMultimodalLM.from_pretrained("build-small-hackathon/lolaby-llama-3b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use build-small-hackathon/lolaby-llama-3b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="build-small-hackathon/lolaby-llama-3b", filename="lolaby-llama-3b-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use build-small-hackathon/lolaby-llama-3b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf build-small-hackathon/lolaby-llama-3b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf build-small-hackathon/lolaby-llama-3b:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf build-small-hackathon/lolaby-llama-3b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf build-small-hackathon/lolaby-llama-3b:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf build-small-hackathon/lolaby-llama-3b:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf build-small-hackathon/lolaby-llama-3b:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf build-small-hackathon/lolaby-llama-3b:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf build-small-hackathon/lolaby-llama-3b:Q4_K_M
Use Docker
docker model run hf.co/build-small-hackathon/lolaby-llama-3b:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use build-small-hackathon/lolaby-llama-3b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "build-small-hackathon/lolaby-llama-3b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "build-small-hackathon/lolaby-llama-3b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/build-small-hackathon/lolaby-llama-3b:Q4_K_M
- SGLang
How to use build-small-hackathon/lolaby-llama-3b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "build-small-hackathon/lolaby-llama-3b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "build-small-hackathon/lolaby-llama-3b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "build-small-hackathon/lolaby-llama-3b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "build-small-hackathon/lolaby-llama-3b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use build-small-hackathon/lolaby-llama-3b with Ollama:
ollama run hf.co/build-small-hackathon/lolaby-llama-3b:Q4_K_M
- Unsloth Studio
How to use build-small-hackathon/lolaby-llama-3b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for build-small-hackathon/lolaby-llama-3b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for build-small-hackathon/lolaby-llama-3b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for build-small-hackathon/lolaby-llama-3b to start chatting
- Pi
How to use build-small-hackathon/lolaby-llama-3b with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf build-small-hackathon/lolaby-llama-3b:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "build-small-hackathon/lolaby-llama-3b:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use build-small-hackathon/lolaby-llama-3b with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf build-small-hackathon/lolaby-llama-3b:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default build-small-hackathon/lolaby-llama-3b:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use build-small-hackathon/lolaby-llama-3b with Docker Model Runner:
docker model run hf.co/build-small-hackathon/lolaby-llama-3b:Q4_K_M
- Lemonade
How to use build-small-hackathon/lolaby-llama-3b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull build-small-hackathon/lolaby-llama-3b:Q4_K_M
Run and chat with the model
lemonade run user.lolaby-llama-3b-Q4_K_M
List all available models
lemonade list
- Lolaby — Llama 3.2 3B, fine-tuned to write singable lullabies
Lolaby — Llama 3.2 3B, fine-tuned to write singable lullabies
A small (3B) instruction-tuned model that writes original, personalised lullabies with chord markers and a tempo/meter header, designed for use with a downstream singing/synthesis pipeline.
This is the lyric-generation model used by Lolaby, a small, AI-powered app that turns a child's drawing into a personalised lullaby. Built for the Build Small Hackathon 2026 (Backyard AI track).
Write a lullaby for: Mia, age 3 Loves: a stuffed fox named after a color Fears: the dark when the light goes out
[C] Mia, your fox keeps [Am] watch tonight, [F] curled up close where it's [G] warm and bright, [C] the dark outside is [Am] soft and small, [F] and your fox can hear me [G] sing through it all...
Training data — built from scratch with anti-boilerplate gates
1,500 lullabies, distilled from Claude Haiku 4.5 with several hard gates that reject boilerplate at generation time:
| Gate | What it does |
|---|---|
| Concrete love steer | Each prompt samples a specific sub-topic from a pool (e.g. "a hot-air balloon drifting up", "the corner bakery at dawn") instead of vague categories — the teacher can't default to its favourite cliché. |
| N-gram dedup | Every generated lyric line is normalised and checked against all previously-accepted lines via 4-gram Jaccard overlap. Examples with > 0.4 overlap on too many lines are rejected. |
| Theme cap | No single love-category (animal, nature, comfort object, etc.) may exceed 16% of the dataset. Forces broad coverage. |
| Opener dedup | The first 4 words of each lullaby (with the name normalised) are tracked. Over-used opening shapes are rejected after 3 uses — kills " watches..." / "...eyes are growing soft". |
| Format gate | Every example must parse — tempo header, valid chord markers from the declared progression, sensible line count — or it's regenerated. |
| Safety re-check | Even though the teacher is told to stay wholesome, the synthesised LOVE, FEAR, and lyric body are each re-screened with a safety filter before acceptance. |
Result: 99.4% line uniqueness (11,925 unique lines out of 12,001 total lyric lines across 1,500 examples).
Diversity dimensions sampled per example
- 49 names across multiple naming traditions (Mia, Aiko, Mateo, Tariq, Saoirse, ...).
- 12 ages (1-7, weighted toward 2-5).
- 10 love categories × ~10-17 specific sub-topics each (≈ 120 concrete loves).
- 15 fear topics (and ~25% of examples have no fear, the natural "just comfort" case).
- 12 moods (cosy and content / restless but settling / clingy and needing reassurance / ...).
- 12 keys, 3 meters (6/8, 3/4, 4/4), 2-4 diatonic progressions per key.
The training-prompt format mirrors what the downstream app sends at inference, so there's no distribution shift between training and serving:
Write a lullaby for: <name>, age <n>
Loves: <concrete love>
Fears: <concrete fear> # omitted if no fear
Mood: <mood>
Key: <key>
Meter: <meter>
Training setup
- Base model:
unsloth/Llama-3.2-3B-Instruct(4-bit loaded via Unsloth) - Method: LoRA via PEFT
- LoRA config:
r=16,alpha=32, dropout 0, applied to all 7 attention + MLP projections (q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj) - Sequence length: 1024
- Optimiser: AdamW 8-bit, learning rate 2e-4, cosine schedule, 20 warmup steps, weight decay 0.01
- Batch: per-device 4, gradient accumulation 4 (effective batch 16)
- Epochs: 3, with
load_best_model_at_endon eval loss - Precision: bf16 where supported, fp16 otherwise
- Eval/save strategy: per-epoch; the best checkpoint by
eval_losswas kept (epoch 2 in our run) - Seed: 42
Trained on a single Colab T4 in ~45 minutes via the included
train_lullaby.ipynb notebook.
Files in this repo
The repo ships in three forms so it's usable across runtimes:
- LoRA adapter (
adapter_model.safetensors) — smallest, requires the base model at inference. - Merged FP16 weights — standalone HF transformers checkpoint.
- GGUF Q4_K_M — for
llama.cppand downstream tooling. The Lolaby app uses these on CPU.
How to use
With llama-cpp-python (CPU, what Lolaby itself uses)
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="build-small-hackathon/lolaby-llama-3b",
filename="*Q4_K_M.gguf",
n_ctx=2048,
verbose=False,
)
prompt = (
"Write a lullaby for: Mia, age 3\n"
"Loves: a stuffed fox named after a color\n"
"Fears: the dark when the light goes out\n"
"Mood: sleepy and comforted\n"
"Key: C major\n"
"Meter: 6/8"
)
out = llm.create_chat_completion(
messages=[
{"role": "system",
"content": "You write personalized lullabies for small children, "
"with chord markers and a tempo/meter header so a guitar "
"accompaniment can be rendered. Output only the lullaby — "
"no preamble."},
{"role": "user", "content": prompt},
],
max_tokens=512,
temperature=0.85,
)
print(out["choices"][0]["message"]["content"])
With transformers (GPU)
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "build-small-hackathon/lolaby-llama-3b"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
messages = [
{"role": "system",
"content": "You write personalized lullabies for small children, "
"with chord markers and a tempo/meter header so a guitar "
"accompaniment can be rendered. Output only the lullaby — "
"no preamble."},
{"role": "user", "content":
"Write a lullaby for: Mia, age 3\n"
"Loves: a stuffed fox named after a color\n"
"Mood: sleepy and comforted\n"
"Key: C major\n"
"Meter: 6/8"},
]
inputs = tok.apply_chat_template(messages, return_tensors="pt",
add_generation_prompt=True).to(model.device)
out = model.generate(inputs, max_new_tokens=512, do_sample=True,
temperature=0.85, top_p=0.95)
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
Output format
Every generation begins with a two-line header, then a blank line, then the
lullaby itself. Lyric lines are prefixed with chord markers in [brackets]
inline, drawn from the requested progression. Two 4-line verses by default.
Tempo: 54 bpm, 6/8
Progression: C - Am - F - G
[C] First line of [Am] first verse,
[F] continues here [G] in rhyme,
[C] third line of [Am] the verse,
[F] gentle close in [G] time...
[C] Second verse [Am] line one,
[F] line two [G] continues,
[C] third line of [Am] verse,
[F] final line [G] ending...
This format is consumed downstream by the Lolaby app's synth and TTS pipeline — the chord markers drive instrument rendering, the tempo header drives playback timing.
Safety
Every training example was safety-screened during generation: the synthesised LOVE, FEAR, and lyric body were each passed through a content filter before acceptance, with anything that slipped through the teacher's own guidance rejected and regenerated. This is a deliberate choice for a model intended for children — the training distribution itself is wholesome by construction, not retrofitted at inference time.
The teacher model was also explicitly instructed to keep all invented content age-appropriate (no death, violence, weapons, horror, substances, or anything frightening beyond a gentle, easily-soothed childhood worry).
Evaluation
Trained on 1,425 examples / held out 75 (5%) for eval. The best checkpoint by held-out loss was kept (epoch 2 of 3).
Beyond eval loss, the model was assessed on a deliberately-varied 10-spec sanity battery stressing diversity (different cultures of name, different love categories, edge-case loves the previous model failed on, fears that need actual soothing). Each generation was scored for:
- Coherence — lines parse as English, no garbled tokens.
- Faithfulness — the love and fear from the prompt appear in the lyric in recognisable form.
- Variety — no two generations from the battery share opening or refrain shapes.
- Format compliance — tempo header present, chord markers from the declared progression only.
The model passes all four on the held-out battery. The previous boilerplate-trained model passed only format compliance.
Handling edge cases
When given a love or fear the model has never seen at training time, it generalises gracefully — finding the nearest comforting concept and weaving it in. A specific fictional character or unusual object will typically be rendered through a familiar lens (e.g. "your little friend", "a quiet companion") so the lullaby stays singable and warm rather than breaking or refusing. This is the right behaviour for a bedtime song: soft landing, not literal lookup.
Limitations
- English only. Training data is English. The model will attempt other languages but quality drops fast and chord markers may be inserted at awkward positions.
- Dataset provenance. Training data was generated by Claude Haiku 4.5 under Anthropic's API. The dataset is not distributed with this repo. Anthropic's usage policy restricts redistribution of Claude-generated outputs, so the dataset is kept private to stay compliant.
- Not a general LLM. This model is narrow-purpose. Asked off-topic questions, it will sometimes attempt to answer in lullaby form.
Intended use and out-of-scope use
Intended:
- Generating personalised bedtime lullabies for use in apps, devices, or printed/digital lyric sheets.
- A teaching example of dataset-quality-first fine-tuning.
Not intended:
- General-purpose conversation or instruction following (the base
Llama-3.2-3B-Instructis better suited for that). - Content involving minors that goes beyond gentle bedtime themes.
- Anything safety-critical.
Citation
If you use this model or the dataset-building approach, a link back to the Lolaby Space or this model is enough. If you want a Bib entry:
@software{lolaby_2026,
author = {André Oliveira and Vasco Oliveira},
title = {Lolaby: a small-AI lullaby generator},
year = {2026},
url = {https://huggingface.co/build-small-hackathon/lolaby-llama-3b},
note = {Built for the Hugging Face Build Small Hackathon 2026.}
}
License
This model inherits the Llama 3.2 Community License from its base. You agree to that license by using these weights — see https://www.llama.com/llama3_2/license/.
Acknowledgements
- Meta for releasing Llama 3.2 3B Instruct as the base.
- Unsloth for the 4-bit + LoRA training stack that made this trainable on a free Colab T4.
- Anthropic for the Claude Haiku 4.5 API used as the teacher model during dataset distillation.
- Hugging Face & Gradio for hosting the Build Small Hackathon 2026 and shaping a thoughtful "small AI" prompt.
- Downloads last month
- 380