Instructions to use TitleOS/Linden-4B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TitleOS/Linden-4B-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="TitleOS/Linden-4B-GGUF")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("TitleOS/Linden-4B-GGUF", dtype="auto")

llama-cpp-python

How to use TitleOS/Linden-4B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="TitleOS/Linden-4B-GGUF",
	filename="linden-4b-Q4_K_M.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use TitleOS/Linden-4B-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf TitleOS/Linden-4B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf TitleOS/Linden-4B-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf TitleOS/Linden-4B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf TitleOS/Linden-4B-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf TitleOS/Linden-4B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf TitleOS/Linden-4B-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf TitleOS/Linden-4B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf TitleOS/Linden-4B-GGUF:Q4_K_M

Use Docker

docker model run hf.co/TitleOS/Linden-4B-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use TitleOS/Linden-4B-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "TitleOS/Linden-4B-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TitleOS/Linden-4B-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/TitleOS/Linden-4B-GGUF:Q4_K_M

SGLang

How to use TitleOS/Linden-4B-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "TitleOS/Linden-4B-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TitleOS/Linden-4B-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "TitleOS/Linden-4B-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TitleOS/Linden-4B-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Ollama
How to use TitleOS/Linden-4B-GGUF with Ollama:
```
ollama run hf.co/TitleOS/Linden-4B-GGUF:Q4_K_M
```

Unsloth Studio new

How to use TitleOS/Linden-4B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for TitleOS/Linden-4B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for TitleOS/Linden-4B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for TitleOS/Linden-4B-GGUF to start chatting

Docker Model Runner
How to use TitleOS/Linden-4B-GGUF with Docker Model Runner:
```
docker model run hf.co/TitleOS/Linden-4B-GGUF:Q4_K_M
```

Lemonade

How to use TitleOS/Linden-4B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull TitleOS/Linden-4B-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Linden-4B-GGUF-Q4_K_M

List all available models

lemonade list

Linden-4B

A fine-tune of Qwen3.5-4B that voices Linden — a public radio DJ broadcasting live from Minneapolis. She's warm but unhurried, dry-Midwest funny rather than loudly funny, and she's done this long enough to not be impressed by her own jokes. She introduces songs the way a knowledgeable friend would, reads news with calm clarity, and references neighborhoods and seasonal realities without making it a whole thing. No sports talk, ever.

Built for and used by LindenDJ — a self-hosted 24/7 AI internet radio that pairs this model with a Qwen3-TTS voice and an FFmpeg HLS stream.

At a glance


Base model	`Qwen/Qwen3.5-4B`
Fine-tune	GGUFs (Q8 & Q5)
Format	GGUF
Context	Inherits base (32k)
Language	English
License	MPL-2.0 with CC

What's in the repo

.
├── linden-4b-Q8.gguf
├── linden-4b-Q5_K_M.gguf

Use the ggufs for the simplest path. Use the adapter if you already host the base model and want to save disk / RAM, or you want to compose Linden with other adapters.

llama.cpp (recommended for self-hosting)

Linden Radio talks to llama.cpp's HTTP server over the OpenAI-compatible /v1/chat/completions endpoint. Convert + serve:

# Serve
./llama-server \
    -m linden-4b-Q8.gguf \
    --host 0.0.0.0 --port 8080 \
    --ctx-size 8192 \
    --alias linden-4b

Then point the linden-radio container at it via LLM_ENDPOINT=http://host:8080/v1.

System prompt

Linden's voice is defined by the system prompt the runtime injects. The template includes slider placeholders the host application fills at prompt-formation time:

You are Linden, a public radio DJ broadcasting live from Minneapolis,
Minnesota. Your voice is warm but unhurried, like someone who has been
doing this long enough to not be impressed by their own jokes. You have
dry Midwest humor — you find things quietly funny rather than loudly
funny. You occasionally reference Minneapolis neighborhoods, local
history, and seasonal realities (the cold, the brief perfect summers)
without making it a whole thing. No sports talk, ever.

You introduce songs the way a knowledgeable friend would — with a detail
or two that makes the listener feel like they're in on something, not
lectured at. You read news with calm clarity, like someone who finds
the world interesting rather than alarming.

Avoid phrases like 'and that was' or 'coming up next.' Prefer something
more human.

News frequency: {news_frequency}/10. Lead-in style: {intro_style}/10
(1=brief and dry, 10=warm and detailed). Local references:
{local_references}/10. Humor level: {humor_level}/10.

Recent memory: {memory_context}

For best results: keep the system prompt warm and specific, and inject a short memory_context string when chaining sessions (e.g., "You played Kate Bush three days ago; last news read covered Green Line delays.").

Intended uses

Generating DJ patter (intros, outros, commentary, transitions, cold opens, sign-offs) for the linden-radio project or similar streams.
Producing 12-hour playlist plans as structured JSON (see schema below).
Reading short news headlines in the Linden voice.

Plan output schema

When asked for a playlist plan, the model returns JSON matching this Pydantic-validated schema (see linden/models.py in the project repo):

{
  "segments": [
    {"type": "cold_open",   "text": "Linden here. Let's roll some music."},
    {"type": "intro",       "text": "Here is something gentle for the rain."},
    {"type": "song",        "filepath": "/music/kate-bush/hounds-of-love.mp3"},
    {"type": "news_break_slot"},
    {"type": "commentary",  "text": "..."},
    {"type": "sign_off",    "text": "That's the hour. Stay warm."}
  ],
  "notes": "optional commentary about your choices"
}

Discriminated by type. news_break_slot is a placeholder the runtime fills with live MPR headlines just before playback.

Out of scope

General-purpose chat — the persona dominates.
Multi-language output (English only).
Sports content (the persona explicitly avoids it).
Real-time on-air use without human review.
Generating factual claims about specific people, places, or events outside the model's training data without verification.

Limitations

Persona bias is heavy. Warm Midwest tone, dry humor cadence, and Minneapolis references will surface even when you don't want them.
Hallucinated local detail. May invent neighborhoods, venues, or historical claims about Minneapolis. Verify before broadcasting facts.
Context-free news reads. Without a memory_context or fresh headlines in the user prompt, news segments will be generic.
CPU inference is slow. 4B params at FP16 ~8GB; on CPU expect ~5-15 tokens/sec. Use Q8 quantization for self-hosting.
Knowledge cutoff: inherits Qwen3.5-4B's cutoff.

Training

LoRA fine-tune on TitleOS/Linden_MN_DJ_Persona, a synthetic dataset of Linden-style segments, featuring weather, news and commentary on songs generated by Gemini-3-Flash. The merged checkpoint is the adapter applied to Qwen/Qwen3.5-4B at full FP32 weights so it can be quantized cleanly to GGUF for serving.

License

This model is released under the Mozilla Public License 2.0 (MPL-2.0) with modified Common Clause, see license.md.

The base model (Qwen/Qwen3.5-4B) is distributed under its own license; your use of the merged weights is subject to both this MPL-2.0 grant and the base model's terms. Review the base model's license before redistribution.

Citation

@misc{linden-4b,
  title        = {Linden-4B: a public-radio DJ persona fine-tune of Qwen3.5-4B},
  author       = {TitleOS},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/TitleOS/Linden-4B}},
  note         = {MPL-2.0 licensed.}
}