Steerling-8B Combined LoRA
A LoRA fine-tune of Steerling-8B -- a model that breaks from virtually every convention in modern large language models. This is not GPT. This is not Llama. This is something fundamentally different.
Why This Model Is Unlike Anything Else on HuggingFace
Nearly every language model you've used -- GPT-4, Llama, Mistral, Qwen, Phi -- works the same way: it predicts the next token, one at a time, left to right. That paradigm has dominated NLP since 2017. Steerling throws it out entirely.
1. Diffusion, Not Autoregression
Steerling is a causal diffusion language model. Instead of predicting tokens sequentially, it generates text through iterative denoising -- the same fundamental process that powers image generators like Stable Diffusion, but applied to language. Tokens are generated in parallel within 64-token blocks, refined over multiple diffusion steps until the output crystallizes. This means:
- Generation isn't strictly left-to-right -- the model can revise and refine tokens within a block simultaneously
- The training objective is masked diffusion (predict randomly masked tokens), not next-token prediction
- The model has a fundamentally different relationship with uncertainty -- it can express and resolve ambiguity across multiple positions at once
2. Interpretable Concept Heads -- You Can See What It's Thinking
Standard transformers are black boxes. You feed in text, get text out, and the internal representations are inscrutable high-dimensional vectors. Steerling decomposes its internal representations into 33,732 known human-interpretable concepts and 101,196 discovered concepts. These aren't post-hoc explanations or probes bolted on after training -- they're a core part of the architecture:
- A known concept head maps hidden states onto a dictionary of pre-defined semantic dimensions
- An unknown concept head discovers additional structure the known dictionary doesn't capture
- The final output is composed from both, meaning every generation can be traced back through interpretable concept activations
3. Concept Steering -- Real-Time Control Over Meaning
This is the killer feature. Because concepts are explicit architectural components (not emergent properties you have to coax out with prompt engineering), you can directly amplify or suppress specific concept dimensions at inference time. Same prompt, same seed, completely different output:
Prompt: "The discovery of a new planet beyond Neptune"
No steering: "...has been announced by astronomers at the University of Hawaii. The object, named 2003 UB313, was discovered on March 27, 2004 using images taken with the Keck II telescope. The object is thought to be about the size of Pluto..."
Concept #100 amplified (+2.5): "...has made the solar system more interesting than ever. The new planet is called Pluto, and it's an icy planet that orbits our sun every 248 years. So, what do we know about Pluto? First off, it's actually pretty big - about 1,400 miles across! That means it could fit three Earths inside!"
Concept #100 suppressed (-2.5): "...is an exciting milestone in our exploration of the universe. By studying this distant world, scientists hope to deepen our understanding of planetary formation and evolution, as well as the potential for life outside of our solar system."
Concept #250 amplified (+2.5): "...can only be determined by the measurement of its radial velocity change relative to its star. An extremely precise and sensitive spectrograph was used for the first time at the High-Altitude Astrophysical Research Observatory on Mauna Kea, Hawaii."
No prompt hacking. No system messages. No "please explain like I'm five." You're reaching directly into the model's conceptual space and turning dials. This is what interpretable AI actually looks like.
How This Compares to Traditional LLMs
| Traditional LLMs (GPT, Llama, etc.) | Steerling-8B | |
|---|---|---|
| Generation | Autoregressive (one token at a time, left-to-right) | Diffusion (parallel denoising within 64-token blocks) |
| Training objective | Next-token prediction | Masked diffusion (predict masked tokens) |
| Interpretability | Black box; requires external probing tools | Built-in concept heads (33K known + 101K discovered concepts) |
| Controllability | Prompt engineering, system messages, RLHF | Direct concept steering with numerical weights |
| Architecture | Standard transformer decoder | Transformer + known/unknown concept heads + concept composition |
| Fine-tuning | Next-token cross-entropy on sequential text | Masked diffusion cross-entropy respecting block boundaries |
What This Adapter Does
This LoRA adapter fine-tunes the attention projections on a balanced mix of educational text (Cosmopedia, FineWeb-Edu) and instruction-following conversations (OpenHermes-2.5), improving generation quality while fully preserving the concept steering and interpretability capabilities.
The result: a model that produces coherent textbook explanations, step-by-step math, creative stories, and structured technical writing -- all steerable in real time through concept interventions.
Sample Outputs
Prompt: "Explain quantum entanglement to a curious teenager:"
Imagine you have two boxes full of toys. One box has red balls and the other has green blocks. Now, let's say there's a magical connection between these two boxes -- whenever we take out a red ball from Box 1, it tells us exactly how many green blocks are in Box 2! This special relationship is called 'entanglement.'
Prompt: "In machine learning, transformers are"
a type of neural network that has gained significant attention in recent years due to their ability to process and analyze sequential data with ease. At its core, a transformer is essentially an encoder-decoder architecture designed specifically for handling long-range dependencies within sequences. This makes them particularly well-suited for tasks such as natural language processing (NLP), speech recognition, and image generation.
Prompt: "Write a short story about a robot discovering music for the first time."
Once upon a time in the bustling city of Metropolis, there was a curious robot named Bolt. He had shiny silver body with sparkly blue eyes that glowed when he felt excited. One sunny afternoon, while browsing through some online ads, Bolt stumbled across an advertisement offering music lessons for robots! ...Suddenly, all sorts of vibrant colors began swirling together forming beautiful patterns in the air above them. These seemed like sound waves but instead they were actually light vibrations created by something called "music."
Prompt: "Dear future self, I am writing this letter in 2026 to remind you that"
life is too short to waste time regretting past decisions. Take the opportunity to grow and learn from your mistakes instead of dwelling on them. Trust me when I say it's never too late to change course or embrace new experiences.
Training Details
Hardware
| Component | Spec |
|---|---|
| GPU | NVIDIA GeForce RTX 5090 (32 GB VRAM) |
| CPU | Intel Core Ultra 9 285K (24 cores, 5.1 GHz) |
| RAM | 192 GB DDR5 |
| Storage | NVMe SSD |
Environment
- Container:
nvidia/cuda:12.8.0-devel-ubuntu24.04 - Python: 3.13
- PyTorch: 2.8.0 (CUDA 12.8)
- steerling: 0.1.2
- peft: 0.18.1
Dataset
A balanced mix of 15,000 texts from three high-quality sources:
| Source | Samples | Description |
|---|---|---|
| Cosmopedia | 5,000 | Synthetic textbooks & blogs (1,250 each from openstax, stanford, stories, web_samples_v1) |
| FineWeb-Edu | 5,000 | Curated high-quality educational web text scored for educational value |
| OpenHermes-2.5 | 5,000 | Instruction-following conversations (formatted as User/Assistant turns) |
Total tokens: ~36M, tokenized into 40,667 training examples at sequence length 512 with stride 256 (50% overlap).
LoRA Configuration
| Parameter | Value |
|---|---|
| Rank (r) | 16 |
| Alpha | 32 |
| Target modules | c_attn, c_proj (attention projections) |
| Dropout | 0.05 |
| Trainable parameters | 19,398,656 (0.23% of 8.4B) |
| Adapter size | ~74 MB |
Training Hyperparameters
| Parameter | Value |
|---|---|
| Epochs | 2 |
| Batch size | 1 |
| Gradient accumulation | 8 (effective batch = 8) |
| Learning rate | 2e-4 (cosine schedule, 100 warmup steps) |
| Weight decay | 0.01 |
| Optimizer | AdamW (betas=0.9, 0.95) |
| Max gradient norm | 1.0 |
| Precision | bfloat16 |
| Mask ratio | 0.15 |
| Diffusion block size | 64 |
| Attention mode | SDPA fallback (flex_attention backward OOMs on consumer GPUs) |
Training Objective
Steerling uses a masked diffusion objective -- a fundamentally different training signal from standard LLMs:
- Tokens are randomly masked at a 15% rate, respecting 64-token block boundaries (blocks are masked together, reflecting the model's block-causal attention structure)
- Within selected blocks, individual tokens have an 85% chance of being masked
- The model predicts the original tokens at masked positions via cross-entropy loss
- This preserves the model's non-autoregressive diffusion generation capability -- the same mechanism used at inference time
This is analogous to how BERT was trained (masked language modeling), but adapted for a generative diffusion architecture with block-causal structure. Standard next-token prediction would destroy the model's ability to generate via denoising.
Results
| Metric | Epoch 1 | Epoch 2 (best) |
|---|---|---|
| Loss | 4.4761 | 4.4516 |
| Perplexity | 87.9 | 85.8 |
| Wall time | 111 min | 111 min |
- Total training time: ~3.9 hours
- Throughput: 401 tok/s steady
- Peak VRAM: 17.3 GB (of 32 GB available)
- Total optimizer steps: 10,166
Usage
Installation
pip install steerling peft torch
Loading the Adapter
import torch
from peft import PeftModel
from steerling import SteerlingGenerator
from steerling.inference.causal_diffusion import GenerationConfig
# Load base model
generator = SteerlingGenerator.from_pretrained(
"guidelabs/steerling-8b", device="cuda", dtype=torch.bfloat16
)
# Patch config for peft compatibility (Steerling uses pydantic, peft expects dict)
if hasattr(generator.model, "config") and not hasattr(generator.model.config, "get"):
generator.model.config = generator.model.config.model_dump()
# Apply LoRA adapter
generator.model = PeftModel.from_pretrained(
generator.model, "cahlen/steerling-8b-combined-lora"
)
generator.model.eval()
Generation
config = GenerationConfig(
max_new_tokens=200,
temperature=0.8,
top_p=0.9,
repetition_penalty=1.2,
)
output = generator.generate("The theory of relativity explains", config=config)
print(output)
Concept Steering
This is what sets Steerling apart. You can directly manipulate the model's internal concept activations:
# Amplify concept #100 -- shifts toward educational/casual register
config = GenerationConfig(
max_new_tokens=200,
temperature=0.8,
top_p=0.9,
repetition_penalty=1.2,
steer_known={100: 2.5},
)
output = generator.generate("Artificial intelligence will transform society by", config=config)
# Amplify concept #250 -- shifts toward technical/instrumentation language
config = GenerationConfig(
max_new_tokens=200,
temperature=0.8,
top_p=0.9,
repetition_penalty=1.2,
steer_known={250: 2.5},
)
output = generator.generate("Artificial intelligence will transform society by", config=config)
# Suppress a concept with negative weight
config = GenerationConfig(
max_new_tokens=200,
temperature=0.8,
top_p=0.9,
repetition_penalty=1.2,
steer_known={100: -2.5},
)
output = generator.generate("Artificial intelligence will transform society by", config=config)
# Combine multiple concepts simultaneously
config = GenerationConfig(
max_new_tokens=200,
temperature=0.8,
top_p=0.9,
repetition_penalty=1.2,
steer_known={100: 2.0, 250: 2.0},
)
output = generator.generate("Artificial intelligence will transform society by", config=config)
Steering tips:
- Use weights in the range [-3.0, 3.0] for best results
- Weights above 5.0 can cause degeneration (repetition, off-topic drift)
- The 33,732 known concepts are indexed numerically -- exploring what each one controls is part of the research value
- Negative weights suppress a concept; positive weights amplify it
Embeddings
The model also produces 4096-dimensional composed embeddings that reflect concept-decomposed representations:
embedding = generator.get_embeddings(
"Neural networks learn through backpropagation.",
pooling="mean",
embedding_type="composed",
)
# Shape: (4096,)
Limitations
- Concept indices are opaque: The 33,732 known concepts are identified by index, not human-readable names. Mapping indices to semantics requires experimentation.
- Aggressive steering degenerates: Steering weights above ~5.0 can cause repetitive or nonsensical output.
- Sequence length 512: Trained at 512 tokens; longer contexts may show degraded quality.
- Non-autoregressive quirks: As a diffusion model, outputs can occasionally have a different "feel" from autoregressive models -- sometimes more holistic and less locally coherent.
- Early ecosystem: Steerling is a new architecture with limited community tooling compared to the Llama/GPT ecosystem.
Technical Notes for Practitioners
- Why LoRA instead of full fine-tuning? The model is 8.4B parameters in bfloat16 (~16.8 GB). Full fine-tuning requires storing gradients for every parameter, pushing memory to ~33.6 GB -- exceeding even the RTX 5090's 32 GB. LoRA freezes the base and trains 19.4M adapter parameters (0.23%), fitting comfortably in 17.3 GB.
- Why SDPA instead of flex_attention? Steerling's block-causal attention uses PyTorch's
flex_attentionwith custom block masks. While efficient for inference,flex_attention's backward pass materializes large intermediate tensors that OOM on consumer GPUs. We force the SDPA fallback which uses Flash Attention kernels with a much more memory-efficient backward pass. - Why masked diffusion instead of next-token prediction? Steerling generates via iterative denoising, not left-to-right prediction. Training with next-token cross-entropy would be architecturally incompatible. The masked diffusion objective matches the model's native inference mechanism -- mask, predict, refine.
- The peft
task_type: nullwarning on HuggingFace is expected. Steerling's architecture doesn't map to any of peft's predefined task types (CAUSAL_LM, FEATURE_EXTRACTION, etc.). Setting task_type to any string value causes loading errors because peft wraps the model in a task-specific class that passes incompatible kwargs to Steerling's forward method.
Citation
If you use this adapter, please cite the base model:
@misc{steerling8b,
title={Steerling-8B: An Interpretable Causal Diffusion Language Model},
author={Guide Labs},
year={2025},
url={https://huggingface.co/guidelabs/steerling-8b}
}
License
This adapter inherits the license of the base model. See guidelabs/steerling-8b for details.
- Downloads last month
- 25
Model tree for cahlen/steerling-8b-combined-lora
Base model
guidelabs/steerling-8bDatasets used to train cahlen/steerling-8b-combined-lora
Evaluation results
- Training Loss (masked diffusion)self-reported4.452
- Perplexityself-reported85.800