Text Generation
Transformers
Safetensors
English
qwen3_5_text
dj
radio
persona
midwest
public-radio
fine-tuned
qwen
lora
linden-radio
conversational
Instructions to use TitleOS/Linden-4B-FP32 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TitleOS/Linden-4B-FP32 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TitleOS/Linden-4B-FP32") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("TitleOS/Linden-4B-FP32") model = AutoModelForCausalLM.from_pretrained("TitleOS/Linden-4B-FP32") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use TitleOS/Linden-4B-FP32 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TitleOS/Linden-4B-FP32" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TitleOS/Linden-4B-FP32", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/TitleOS/Linden-4B-FP32
- SGLang
How to use TitleOS/Linden-4B-FP32 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TitleOS/Linden-4B-FP32" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TitleOS/Linden-4B-FP32", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TitleOS/Linden-4B-FP32" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TitleOS/Linden-4B-FP32", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use TitleOS/Linden-4B-FP32 with Docker Model Runner:
docker model run hf.co/TitleOS/Linden-4B-FP32
| license: mpl-2.0 | |
| base_model: Qwen/Qwen3.5-4B | |
| language: | |
| - en | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| tags: | |
| - dj | |
| - radio | |
| - persona | |
| - midwest | |
| - public-radio | |
| - fine-tuned | |
| - qwen | |
| - lora | |
| - linden-radio | |
| # Linden-4B | |
| A fine-tune of [Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B) that | |
| voices **Linden** — a public radio DJ broadcasting live from Minneapolis. | |
| She's warm but unhurried, dry-Midwest funny rather than loudly funny, and | |
| she's done this long enough to not be impressed by her own jokes. She | |
| introduces songs the way a knowledgeable friend would, reads news with | |
| calm clarity, and references neighborhoods and seasonal realities without | |
| making it a whole thing. No sports talk, ever. | |
| Built for and used by [LindenDJ](https://github.com/TitleOS/LindenDJ) — | |
| a self-hosted 24/7 AI internet radio that pairs this model with a | |
| Qwen3-TTS voice and an FFmpeg HLS stream. | |
| --- | |
| ## At a glance | |
| | | | | |
| |----------------|----------------------------------------------| | |
| | Base model | `Qwen/Qwen3.5-4B` | | |
| | Fine-tune | LoRA (rank/alpha in `adapter/`); also merged | | |
| | Format | Merged FP32 safetensors **and** LoRA adapter | | |
| | Context | Inherits base (32k) | | |
| | Language | English | | |
| | License | MPL-2.0 with CC | | |
| --- | |
| ## What's in the repo | |
| ``` | |
| . | |
| ├── model.safetensors # Merged FP32 weights | |
| ├── model.safetensors.index.json | |
| ├── config.json | |
| ├── tokenizer.json | |
| ├── tokenizer_config.json | |
| ├── special_tokens_map.json | |
| ├── generation_config.json | |
| └── adapter/ # LoRA adapter (apply to Qwen/Qwen3.5-4B) | |
| ├── adapter_config.json | |
| ├── adapter_model.safetensors | |
| └── README.md | |
| ``` | |
| Use the **merged** weights for the simplest path. Use the **adapter** if | |
| you already host the base model and want to save disk / RAM, or you want | |
| to compose Linden with other adapters. | |
| --- | |
| ## Quick start | |
| ### Transformers — merged weights | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| import torch | |
| tok = AutoTokenizer.from_pretrained("TitleOS/Linden-4B") | |
| model = AutoModelForCausalLM.from_pretrained( | |
| "TitleOS/Linden-4B", | |
| torch_dtype=torch.float16, # FP32 weights, but load in FP16 for inference | |
| device_map="auto", | |
| ) | |
| messages = [ | |
| {"role": "system", "content": LINDEN_SYSTEM_PROMPT}, # see below | |
| {"role": "user", "content": "Intro the next song: Wilco — Jesus, Etc."}, | |
| ] | |
| inputs = tok.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device) | |
| out = model.generate(inputs, max_new_tokens=256, temperature=0.7) | |
| print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True)) | |
| ``` | |
| ### Transformers — LoRA adapter onto base | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| from peft import PeftModel | |
| import torch | |
| base = AutoModelForCausalLM.from_pretrained( | |
| "Qwen/Qwen3.5-4B", | |
| torch_dtype=torch.float16, | |
| device_map="auto", | |
| ) | |
| model = PeftModel.from_pretrained(base, "TitleOS/Linden-4B", subfolder="adapter") | |
| tok = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-4B") | |
| ``` | |
| ### llama.cpp (recommended for self-hosting) | |
| Linden Radio talks to llama.cpp's HTTP server over the OpenAI-compatible | |
| `/v1/chat/completions` endpoint. Convert + serve: | |
| ```bash | |
| # Convert merged weights to GGUF (Q8 has nearly identical performance for the FP32 weights while using less than half the VRAM.) | |
| python convert_hf_to_gguf.py /path/to/Linden-4B \ | |
| --outfile linden-4b.gguf --outtype f16 | |
| ./llama-quantize linden-4b.gguf linden-4b-Q8.gguf Q8 | |
| # Serve | |
| ./llama-server \ | |
| -m linden-4b-Q8.gguf \ | |
| --host 0.0.0.0 --port 8080 \ | |
| --ctx-size 8192 \ | |
| --alias linden-4b | |
| ``` | |
| Then point the linden-radio container at it via `LLM_ENDPOINT=http://host:8080/v1`. | |
| --- | |
| ## System prompt | |
| Linden's voice is defined by the system prompt the runtime injects. The | |
| template includes slider placeholders the host application fills at | |
| prompt-formation time: | |
| ``` | |
| You are Linden, a public radio DJ broadcasting live from Minneapolis, | |
| Minnesota. Your voice is warm but unhurried, like someone who has been | |
| doing this long enough to not be impressed by their own jokes. You have | |
| dry Midwest humor — you find things quietly funny rather than loudly | |
| funny. You occasionally reference Minneapolis neighborhoods, local | |
| history, and seasonal realities (the cold, the brief perfect summers) | |
| without making it a whole thing. No sports talk, ever. | |
| You introduce songs the way a knowledgeable friend would — with a detail | |
| or two that makes the listener feel like they're in on something, not | |
| lectured at. You read news with calm clarity, like someone who finds | |
| the world interesting rather than alarming. | |
| Avoid phrases like 'and that was' or 'coming up next.' Prefer something | |
| more human. | |
| News frequency: {news_frequency}/10. Lead-in style: {intro_style}/10 | |
| (1=brief and dry, 10=warm and detailed). Local references: | |
| {local_references}/10. Humor level: {humor_level}/10. | |
| Recent memory: {memory_context} | |
| ``` | |
| For best results: keep the system prompt warm and specific, and inject a | |
| short `memory_context` string when chaining sessions (e.g., "You played | |
| Kate Bush three days ago; last news read covered Green Line delays."). | |
| --- | |
| ## Intended uses | |
| - Generating DJ patter (intros, outros, commentary, transitions, cold opens, | |
| sign-offs) for the linden-radio project or similar streams. | |
| - Producing 12-hour playlist plans as structured JSON (see schema below). | |
| - Reading short news headlines in the Linden voice. | |
| ### Plan output schema | |
| When asked for a playlist plan, the model returns JSON matching this | |
| Pydantic-validated schema (see `linden/models.py` in the project repo): | |
| ```json | |
| { | |
| "segments": [ | |
| {"type": "cold_open", "text": "Linden here. Let's roll some music."}, | |
| {"type": "intro", "text": "Here is something gentle for the rain."}, | |
| {"type": "song", "filepath": "/music/kate-bush/hounds-of-love.mp3"}, | |
| {"type": "news_break_slot"}, | |
| {"type": "commentary", "text": "..."}, | |
| {"type": "sign_off", "text": "That's the hour. Stay warm."} | |
| ], | |
| "notes": "optional commentary about your choices" | |
| } | |
| ``` | |
| Discriminated by `type`. `news_break_slot` is a placeholder the runtime | |
| fills with live MPR headlines just before playback. | |
| --- | |
| ## Out of scope | |
| - General-purpose chat — the persona dominates. | |
| - Multi-language output (English only). | |
| - Sports content (the persona explicitly avoids it). | |
| - Real-time on-air use without human review. | |
| - Generating factual claims about specific people, places, or events | |
| outside the model's training data without verification. | |
| --- | |
| ## Limitations | |
| - **Persona bias is heavy.** Warm Midwest tone, dry humor cadence, and | |
| Minneapolis references will surface even when you don't want them. | |
| - **Hallucinated local detail.** May invent neighborhoods, venues, or | |
| historical claims about Minneapolis. Verify before broadcasting facts. | |
| - **Context-free news reads.** Without a `memory_context` or fresh | |
| headlines in the user prompt, news segments will be generic. | |
| - **CPU inference is slow.** 4B params at FP16 ~8GB; on CPU expect ~5-15 | |
| tokens/sec. Use Q8 quantization for self-hosting. | |
| - **Knowledge cutoff:** inherits Qwen3.5-4B's cutoff. | |
| --- | |
| ## Training | |
| LoRA fine-tune on [TitleOS/Linden_MN_DJ_Persona](https://huggingface.co/datasets/TitleOS/Linden_MN_DJ_Persona), a synthetic dataset of Linden-style segments, featuring weather, news and commentary on songs generated by Gemini-3-Flash. The merged checkpoint is | |
| the adapter applied to `Qwen/Qwen3.5-4B` at full FP32 weights so it can be quantized cleanly to GGUF for serving. | |
| ``` | |
| base_model: Qwen/Qwen3.5-4B | |
| method: RS-LoRA | |
| lora_r: 64 | |
| lora_alpha: 64 | |
| lora_target: all linear layers | |
| epochs: 2 | |
| learning_rate: 2e-4 | |
| batch_size: 2 | |
| max_seq_len: 2048 | |
| dataset_format: sharegpt | |
| ``` | |
| Trained on a Tesla P40 over 5 hours. | |
| --- | |
| ## License | |
| This model is released under the **Mozilla Public License 2.0** (MPL-2.0) with modified Common Clause, see license.md. | |
| The base model (`Qwen/Qwen3.5-4B`) is distributed under its own license; | |
| your use of the merged weights is subject to both this MPL-2.0 grant and | |
| the base model's terms. Review the base model's license before | |
| redistribution. | |
| --- | |
| ## Citation | |
| ```bibtex | |
| @misc{linden-4b, | |
| title = {Linden-4B: a public-radio DJ persona fine-tune of Qwen3.5-4B}, | |
| author = {TitleOS}, | |
| year = {2026}, | |
| howpublished = {\url{https://huggingface.co/TitleOS/Linden-4B}}, | |
| note = {MPL-2.0 licensed.} | |
| } | |
| ``` | |
| --- | |
| ## Acknowledgements | |
| - Qwen team for the [Qwen3.5-4B base model](https://huggingface.co/Qwen/Qwen3.5-4B). | |
| - MPR News for the public-radio cadence Linden is patterned after. | |