Instructions to use build-small-hackathon/lolaby-llama-3b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use build-small-hackathon/lolaby-llama-3b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="build-small-hackathon/lolaby-llama-3b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("build-small-hackathon/lolaby-llama-3b")
model = AutoModelForCausalLM.from_pretrained("build-small-hackathon/lolaby-llama-3b", device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use build-small-hackathon/lolaby-llama-3b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="build-small-hackathon/lolaby-llama-3b",
	filename="lolaby-llama-3b-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use build-small-hackathon/lolaby-llama-3b with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf build-small-hackathon/lolaby-llama-3b:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf build-small-hackathon/lolaby-llama-3b:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf build-small-hackathon/lolaby-llama-3b:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf build-small-hackathon/lolaby-llama-3b:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf build-small-hackathon/lolaby-llama-3b:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf build-small-hackathon/lolaby-llama-3b:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf build-small-hackathon/lolaby-llama-3b:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf build-small-hackathon/lolaby-llama-3b:Q4_K_M

Use Docker

docker model run hf.co/build-small-hackathon/lolaby-llama-3b:Q4_K_M

LM Studio
Jan

vLLM

How to use build-small-hackathon/lolaby-llama-3b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "build-small-hackathon/lolaby-llama-3b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "build-small-hackathon/lolaby-llama-3b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/build-small-hackathon/lolaby-llama-3b:Q4_K_M

SGLang

How to use build-small-hackathon/lolaby-llama-3b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "build-small-hackathon/lolaby-llama-3b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "build-small-hackathon/lolaby-llama-3b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "build-small-hackathon/lolaby-llama-3b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "build-small-hackathon/lolaby-llama-3b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use build-small-hackathon/lolaby-llama-3b with Ollama:
```
ollama run hf.co/build-small-hackathon/lolaby-llama-3b:Q4_K_M
```

Unsloth Studio

How to use build-small-hackathon/lolaby-llama-3b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for build-small-hackathon/lolaby-llama-3b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for build-small-hackathon/lolaby-llama-3b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for build-small-hackathon/lolaby-llama-3b to start chatting

How to use build-small-hackathon/lolaby-llama-3b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf build-small-hackathon/lolaby-llama-3b:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "build-small-hackathon/lolaby-llama-3b:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use build-small-hackathon/lolaby-llama-3b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf build-small-hackathon/lolaby-llama-3b:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default build-small-hackathon/lolaby-llama-3b:Q4_K_M

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use build-small-hackathon/lolaby-llama-3b with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf build-small-hackathon/lolaby-llama-3b:Q4_K_M

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "build-small-hackathon/lolaby-llama-3b:Q4_K_M" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use build-small-hackathon/lolaby-llama-3b with Docker Model Runner:
```
docker model run hf.co/build-small-hackathon/lolaby-llama-3b:Q4_K_M
```

Lemonade

How to use build-small-hackathon/lolaby-llama-3b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull build-small-hackathon/lolaby-llama-3b:Q4_K_M

Run and chat with the model

lemonade run user.lolaby-llama-3b-Q4_K_M

List all available models

lemonade list

Lolaby — Llama 3.2 3B, fine-tuned to write singable lullabies

A small (3B) instruction-tuned model that writes original, personalised lullabies with chord markers and a tempo/meter header, designed for use with a downstream singing/synthesis pipeline.

This is the lyric-generation model used by Lolaby, a small, AI-powered app that turns a child's drawing into a personalised lullaby. Built for the Build Small Hackathon 2026 (Backyard AI track).

Write a lullaby for: Mia, age 3 Loves: a stuffed fox named after a color Fears: the dark when the light goes out

[C] Mia, your fox keeps [Am] watch tonight, [F] curled up close where it's [G] warm and bright, [C] the dark outside is [Am] soft and small, [F] and your fox can hear me [G] sing through it all...

Training data — built from scratch with anti-boilerplate gates

1,500 lullabies, distilled from Claude Haiku 4.5 with several hard gates that reject boilerplate at generation time:

Gate	What it does
Concrete love steer	Each prompt samples a specific sub-topic from a pool (e.g. "a hot-air balloon drifting up", "the corner bakery at dawn") instead of vague categories — the teacher can't default to its favourite cliché.
N-gram dedup	Every generated lyric line is normalised and checked against all previously-accepted lines via 4-gram Jaccard overlap. Examples with > 0.4 overlap on too many lines are rejected.
Theme cap	No single love-category (animal, nature, comfort object, etc.) may exceed 16% of the dataset. Forces broad coverage.
Opener dedup	The first 4 words of each lullaby (with the name normalised) are tracked. Over-used opening shapes are rejected after 3 uses — kills " watches..." / "...eyes are growing soft".
Format gate	Every example must parse — tempo header, valid chord markers from the declared progression, sensible line count — or it's regenerated.
Safety re-check	Even though the teacher is told to stay wholesome, the synthesised LOVE, FEAR, and lyric body are each re-screened with a safety filter before acceptance.

Result: 99.4% line uniqueness (11,925 unique lines out of 12,001 total lyric lines across 1,500 examples).

Diversity dimensions sampled per example

49 names across multiple naming traditions (Mia, Aiko, Mateo, Tariq, Saoirse, ...).
12 ages (1-7, weighted toward 2-5).
10 love categories × ~10-17 specific sub-topics each (≈ 120 concrete loves).
15 fear topics (and ~25% of examples have no fear, the natural "just comfort" case).
12 moods (cosy and content / restless but settling / clingy and needing reassurance / ...).
12 keys, 3 meters (6/8, 3/4, 4/4), 2-4 diatonic progressions per key.

The training-prompt format mirrors what the downstream app sends at inference, so there's no distribution shift between training and serving:

Write a lullaby for: <name>, age <n>
Loves: <concrete love>
Fears: <concrete fear>      # omitted if no fear
Mood: <mood>
Key: <key>
Meter: <meter>

Training setup

Base model: unsloth/Llama-3.2-3B-Instruct (4-bit loaded via Unsloth)
Method: LoRA via PEFT
LoRA config: r=16, alpha=32, dropout 0, applied to all 7 attention + MLP projections (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
Sequence length: 1024
Optimiser: AdamW 8-bit, learning rate 2e-4, cosine schedule, 20 warmup steps, weight decay 0.01
Batch: per-device 4, gradient accumulation 4 (effective batch 16)
Epochs: 3, with load_best_model_at_end on eval loss
Precision: bf16 where supported, fp16 otherwise
Eval/save strategy: per-epoch; the best checkpoint by eval_loss was kept (epoch 2 in our run)
Seed: 42

Trained on a single Colab T4 in ~45 minutes via the included train_lullaby.ipynb notebook.

Files in this repo

The repo ships in three forms so it's usable across runtimes:

LoRA adapter (adapter_model.safetensors) — smallest, requires the base model at inference.
Merged FP16 weights — standalone HF transformers checkpoint.
GGUF Q4_K_M — for llama.cpp and downstream tooling. The Lolaby app uses these on CPU.

How to use

With `llama-cpp-python` (CPU, what Lolaby itself uses)

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="build-small-hackathon/lolaby-llama-3b",
    filename="*Q4_K_M.gguf",
    n_ctx=2048,
    verbose=False,
)

prompt = (
    "Write a lullaby for: Mia, age 3\n"
    "Loves: a stuffed fox named after a color\n"
    "Fears: the dark when the light goes out\n"
    "Mood: sleepy and comforted\n"
    "Key: C major\n"
    "Meter: 6/8"
)

out = llm.create_chat_completion(
    messages=[
        {"role": "system",
         "content": "You write personalized lullabies for small children, "
                    "with chord markers and a tempo/meter header so a guitar "
                    "accompaniment can be rendered. Output only the lullaby — "
                    "no preamble."},
        {"role": "user", "content": prompt},
    ],
    max_tokens=512,
    temperature=0.85,
)
print(out["choices"][0]["message"]["content"])

With `transformers` (GPU)

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "build-small-hackathon/lolaby-llama-3b"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

messages = [
    {"role": "system",
     "content": "You write personalized lullabies for small children, "
                "with chord markers and a tempo/meter header so a guitar "
                "accompaniment can be rendered. Output only the lullaby — "
                "no preamble."},
    {"role": "user", "content":
        "Write a lullaby for: Mia, age 3\n"
        "Loves: a stuffed fox named after a color\n"
        "Mood: sleepy and comforted\n"
        "Key: C major\n"
        "Meter: 6/8"},
]
inputs = tok.apply_chat_template(messages, return_tensors="pt",
                                  add_generation_prompt=True).to(model.device)
out = model.generate(inputs, max_new_tokens=512, do_sample=True,
                     temperature=0.85, top_p=0.95)
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

Output format

Every generation begins with a two-line header, then a blank line, then the lullaby itself. Lyric lines are prefixed with chord markers in [brackets] inline, drawn from the requested progression. Two 4-line verses by default.

Tempo: 54 bpm, 6/8
Progression: C - Am - F - G

[C] First line of [Am] first verse,
[F] continues here [G] in rhyme,
[C] third line of [Am] the verse,
[F] gentle close in [G] time...

[C] Second verse [Am] line one,
[F] line two [G] continues,
[C] third line of [Am] verse,
[F] final line [G] ending...

This format is consumed downstream by the Lolaby app's synth and TTS pipeline — the chord markers drive instrument rendering, the tempo header drives playback timing.

Safety

Every training example was safety-screened during generation: the synthesised LOVE, FEAR, and lyric body were each passed through a content filter before acceptance, with anything that slipped through the teacher's own guidance rejected and regenerated. This is a deliberate choice for a model intended for children — the training distribution itself is wholesome by construction, not retrofitted at inference time.

The teacher model was also explicitly instructed to keep all invented content age-appropriate (no death, violence, weapons, horror, substances, or anything frightening beyond a gentle, easily-soothed childhood worry).

Evaluation

Trained on 1,425 examples / held out 75 (5%) for eval. The best checkpoint by held-out loss was kept (epoch 2 of 3).

Beyond eval loss, the model was assessed on a deliberately-varied 10-spec sanity battery stressing diversity (different cultures of name, different love categories, edge-case loves the previous model failed on, fears that need actual soothing). Each generation was scored for:

Coherence — lines parse as English, no garbled tokens.
Faithfulness — the love and fear from the prompt appear in the lyric in recognisable form.
Variety — no two generations from the battery share opening or refrain shapes.
Format compliance — tempo header present, chord markers from the declared progression only.

The model passes all four on the held-out battery. The previous boilerplate-trained model passed only format compliance.

Handling edge cases

When given a love or fear the model has never seen at training time, it generalises gracefully — finding the nearest comforting concept and weaving it in. A specific fictional character or unusual object will typically be rendered through a familiar lens (e.g. "your little friend", "a quiet companion") so the lullaby stays singable and warm rather than breaking or refusing. This is the right behaviour for a bedtime song: soft landing, not literal lookup.

Limitations

English only. Training data is English. The model will attempt other languages but quality drops fast and chord markers may be inserted at awkward positions.
Dataset provenance. Training data was generated by Claude Haiku 4.5 under Anthropic's API. The dataset is not distributed with this repo. Anthropic's usage policy restricts redistribution of Claude-generated outputs, so the dataset is kept private to stay compliant.
Not a general LLM. This model is narrow-purpose. Asked off-topic questions, it will sometimes attempt to answer in lullaby form.

Intended use and out-of-scope use

Intended:

Generating personalised bedtime lullabies for use in apps, devices, or printed/digital lyric sheets.
A teaching example of dataset-quality-first fine-tuning.

Not intended:

General-purpose conversation or instruction following (the base Llama-3.2-3B-Instruct is better suited for that).
Content involving minors that goes beyond gentle bedtime themes.
Anything safety-critical.

Citation

If you use this model or the dataset-building approach, a link back to the Lolaby Space or this model is enough. If you want a Bib entry:

@software{lolaby_2026,
  author  = {André Oliveira and Vasco Oliveira},
  title   = {Lolaby: a small-AI lullaby generator},
  year    = {2026},
  url     = {https://huggingface.co/build-small-hackathon/lolaby-llama-3b},
  note    = {Built for the Hugging Face Build Small Hackathon 2026.}
}

License

This model inherits the Llama 3.2 Community License from its base. You agree to that license by using these weights — see https://www.llama.com/llama3_2/license/.

Acknowledgements

Meta for releasing Llama 3.2 3B Instruct as the base.
Unsloth for the 4-bit + LoRA training stack that made this trainable on a free Colab T4.
Anthropic for the Claude Haiku 4.5 API used as the teacher model during dataset distillation.
Hugging Face & Gradio for hosting the Build Small Hackathon 2026 and shaping a thoughtful "small AI" prompt.