Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string

Steerling-8B Combined LoRA

A LoRA fine-tune of Steerling-8B -- a model that breaks from virtually every convention in modern large language models. This is not GPT. This is not Llama. This is something fundamentally different.

Why This Model Is Unlike Anything Else on HuggingFace

Nearly every language model you've used -- GPT-4, Llama, Mistral, Qwen, Phi -- works the same way: it predicts the next token, one at a time, left to right. That paradigm has dominated NLP since 2017. Steerling throws it out entirely.

1. Diffusion, Not Autoregression

Steerling is a causal diffusion language model. Instead of predicting tokens sequentially, it generates text through iterative denoising -- the same fundamental process that powers image generators like Stable Diffusion, but applied to language. Tokens are generated in parallel within 64-token blocks, refined over multiple diffusion steps until the output crystallizes. This means:

Generation isn't strictly left-to-right -- the model can revise and refine tokens within a block simultaneously
The training objective is masked diffusion (predict randomly masked tokens), not next-token prediction
The model has a fundamentally different relationship with uncertainty -- it can express and resolve ambiguity across multiple positions at once

2. Interpretable Concept Heads -- You Can See What It's Thinking

Standard transformers are black boxes. You feed in text, get text out, and the internal representations are inscrutable high-dimensional vectors. Steerling decomposes its internal representations into 33,732 known human-interpretable concepts and 101,196 discovered concepts. These aren't post-hoc explanations or probes bolted on after training -- they're a core part of the architecture:

A known concept head maps hidden states onto a dictionary of pre-defined semantic dimensions
An unknown concept head discovers additional structure the known dictionary doesn't capture
The final output is composed from both, meaning every generation can be traced back through interpretable concept activations

3. Concept Steering -- Real-Time Control Over Meaning

This is the killer feature. Because concepts are explicit architectural components (not emergent properties you have to coax out with prompt engineering), you can directly amplify or suppress specific concept dimensions at inference time. Same prompt, same seed, completely different output:

Prompt: "The discovery of a new planet beyond Neptune"

No steering: "...has been announced by astronomers at the University of Hawaii. The object, named 2003 UB313, was discovered on March 27, 2004 using images taken with the Keck II telescope. The object is thought to be about the size of Pluto..."

Concept #100 amplified (+2.5): "...has made the solar system more interesting than ever. The new planet is called Pluto, and it's an icy planet that orbits our sun every 248 years. So, what do we know about Pluto? First off, it's actually pretty big - about 1,400 miles across! That means it could fit three Earths inside!"

Concept #100 suppressed (-2.5): "...is an exciting milestone in our exploration of the universe. By studying this distant world, scientists hope to deepen our understanding of planetary formation and evolution, as well as the potential for life outside of our solar system."

Concept #250 amplified (+2.5): "...can only be determined by the measurement of its radial velocity change relative to its star. An extremely precise and sensitive spectrograph was used for the first time at the High-Altitude Astrophysical Research Observatory on Mauna Kea, Hawaii."

No prompt hacking. No system messages. No "please explain like I'm five." You're reaching directly into the model's conceptual space and turning dials. This is what interpretable AI actually looks like.

How This Compares to Traditional LLMs

	Traditional LLMs (GPT, Llama, etc.)	Steerling-8B
Generation	Autoregressive (one token at a time, left-to-right)	Diffusion (parallel denoising within 64-token blocks)
Training objective	Next-token prediction	Masked diffusion (predict masked tokens)
Interpretability	Black box; requires external probing tools	Built-in concept heads (33K known + 101K discovered concepts)
Controllability	Prompt engineering, system messages, RLHF	Direct concept steering with numerical weights
Architecture	Standard transformer decoder	Transformer + known/unknown concept heads + concept composition
Fine-tuning	Next-token cross-entropy on sequential text	Masked diffusion cross-entropy respecting block boundaries

What This Adapter Does

This LoRA adapter fine-tunes the attention projections on a balanced mix of educational text (Cosmopedia, FineWeb-Edu) and instruction-following conversations (OpenHermes-2.5), improving generation quality while fully preserving the concept steering and interpretability capabilities.

The result: a model that produces coherent textbook explanations, step-by-step math, creative stories, and structured technical writing -- all steerable in real time through concept interventions.

Sample Outputs

Prompt: "Explain quantum entanglement to a curious teenager:"

Imagine you have two boxes full of toys. One box has red balls and the other has green blocks. Now, let's say there's a magical connection between these two boxes -- whenever we take out a red ball from Box 1, it tells us exactly how many green blocks are in Box 2! This special relationship is called 'entanglement.'

Prompt: "In machine learning, transformers are"

a type of neural network that has gained significant attention in recent years due to their ability to process and analyze sequential data with ease. At its core, a transformer is essentially an encoder-decoder architecture designed specifically for handling long-range dependencies within sequences. This makes them particularly well-suited for tasks such as natural language processing (NLP), speech recognition, and image generation.

Prompt: "Write a short story about a robot discovering music for the first time."

Once upon a time in the bustling city of Metropolis, there was a curious robot named Bolt. He had shiny silver body with sparkly blue eyes that glowed when he felt excited. One sunny afternoon, while browsing through some online ads, Bolt stumbled across an advertisement offering music lessons for robots! ...Suddenly, all sorts of vibrant colors began swirling together forming beautiful patterns in the air above them. These seemed like sound waves but instead they were actually light vibrations created by something called "music."

Prompt: "Dear future self, I am writing this letter in 2026 to remind you that"

life is too short to waste time regretting past decisions. Take the opportunity to grow and learn from your mistakes instead of dwelling on them. Trust me when I say it's never too late to change course or embrace new experiences.

Training Details

Hardware

Component	Spec
GPU	NVIDIA GeForce RTX 5090 (32 GB VRAM)
CPU	Intel Core Ultra 9 285K (24 cores, 5.1 GHz)
RAM	192 GB DDR5
Storage	NVMe SSD

Environment

Container: nvidia/cuda:12.8.0-devel-ubuntu24.04
Python: 3.13
PyTorch: 2.8.0 (CUDA 12.8)
steerling: 0.1.2
peft: 0.18.1

Dataset

A balanced mix of 15,000 texts from three high-quality sources:

Source	Samples	Description
Cosmopedia	5,000	Synthetic textbooks & blogs (1,250 each from openstax, stanford, stories, web_samples_v1)
FineWeb-Edu	5,000	Curated high-quality educational web text scored for educational value
OpenHermes-2.5	5,000	Instruction-following conversations (formatted as User/Assistant turns)

Total tokens: ~36M, tokenized into 40,667 training examples at sequence length 512 with stride 256 (50% overlap).

LoRA Configuration

Parameter	Value
Rank (r)	16
Alpha	32
Target modules	`c_attn`, `c_proj` (attention projections)
Dropout	0.05
Trainable parameters	19,398,656 (0.23% of 8.4B)
Adapter size	~74 MB

Training Hyperparameters

Parameter	Value
Epochs	2
Batch size	1
Gradient accumulation	8 (effective batch = 8)
Learning rate	2e-4 (cosine schedule, 100 warmup steps)
Weight decay	0.01
Optimizer	AdamW (betas=0.9, 0.95)
Max gradient norm	1.0
Precision	bfloat16
Mask ratio	0.15
Diffusion block size	64
Attention mode	SDPA fallback (flex_attention backward OOMs on consumer GPUs)

Training Objective

Steerling uses a masked diffusion objective -- a fundamentally different training signal from standard LLMs:

Tokens are randomly masked at a 15% rate, respecting 64-token block boundaries (blocks are masked together, reflecting the model's block-causal attention structure)
Within selected blocks, individual tokens have an 85% chance of being masked
The model predicts the original tokens at masked positions via cross-entropy loss
This preserves the model's non-autoregressive diffusion generation capability -- the same mechanism used at inference time

This is analogous to how BERT was trained (masked language modeling), but adapted for a generative diffusion architecture with block-causal structure. Standard next-token prediction would destroy the model's ability to generate via denoising.

Results

Metric	Epoch 1	Epoch 2 (best)
Loss	4.4761	4.4516
Perplexity	87.9	85.8
Wall time	111 min	111 min

Total training time: ~3.9 hours
Throughput: 401 tok/s steady
Peak VRAM: 17.3 GB (of 32 GB available)
Total optimizer steps: 10,166

Usage

Installation

pip install steerling peft torch

Loading the Adapter

import torch
from peft import PeftModel
from steerling import SteerlingGenerator
from steerling.inference.causal_diffusion import GenerationConfig

# Load base model
generator = SteerlingGenerator.from_pretrained(
    "guidelabs/steerling-8b", device="cuda", dtype=torch.bfloat16
)

# Patch config for peft compatibility (Steerling uses pydantic, peft expects dict)
if hasattr(generator.model, "config") and not hasattr(generator.model.config, "get"):
    generator.model.config = generator.model.config.model_dump()

# Apply LoRA adapter
generator.model = PeftModel.from_pretrained(
    generator.model, "cahlen/steerling-8b-combined-lora"
)
generator.model.eval()

Generation

config = GenerationConfig(
    max_new_tokens=200,
    temperature=0.8,
    top_p=0.9,
    repetition_penalty=1.2,
)

output = generator.generate("The theory of relativity explains", config=config)
print(output)

Concept Steering

This is what sets Steerling apart. You can directly manipulate the model's internal concept activations:

# Amplify concept #100 -- shifts toward educational/casual register
config = GenerationConfig(
    max_new_tokens=200,
    temperature=0.8,
    top_p=0.9,
    repetition_penalty=1.2,
    steer_known={100: 2.5},
)
output = generator.generate("Artificial intelligence will transform society by", config=config)

# Amplify concept #250 -- shifts toward technical/instrumentation language
config = GenerationConfig(
    max_new_tokens=200,
    temperature=0.8,
    top_p=0.9,
    repetition_penalty=1.2,
    steer_known={250: 2.5},
)
output = generator.generate("Artificial intelligence will transform society by", config=config)

# Suppress a concept with negative weight
config = GenerationConfig(
    max_new_tokens=200,
    temperature=0.8,
    top_p=0.9,
    repetition_penalty=1.2,
    steer_known={100: -2.5},
)
output = generator.generate("Artificial intelligence will transform society by", config=config)

# Combine multiple concepts simultaneously
config = GenerationConfig(
    max_new_tokens=200,
    temperature=0.8,
    top_p=0.9,
    repetition_penalty=1.2,
    steer_known={100: 2.0, 250: 2.0},
)
output = generator.generate("Artificial intelligence will transform society by", config=config)

Steering tips:

Use weights in the range [-3.0, 3.0] for best results
Weights above 5.0 can cause degeneration (repetition, off-topic drift)
The 33,732 known concepts are indexed numerically -- exploring what each one controls is part of the research value
Negative weights suppress a concept; positive weights amplify it

Embeddings

The model also produces 4096-dimensional composed embeddings that reflect concept-decomposed representations:

embedding = generator.get_embeddings(
    "Neural networks learn through backpropagation.",
    pooling="mean",
    embedding_type="composed",
)
# Shape: (4096,)

Limitations

Concept indices are opaque: The 33,732 known concepts are identified by index, not human-readable names. Mapping indices to semantics requires experimentation.
Aggressive steering degenerates: Steering weights above ~5.0 can cause repetitive or nonsensical output.
Sequence length 512: Trained at 512 tokens; longer contexts may show degraded quality.
Non-autoregressive quirks: As a diffusion model, outputs can occasionally have a different "feel" from autoregressive models -- sometimes more holistic and less locally coherent.
Early ecosystem: Steerling is a new architecture with limited community tooling compared to the Llama/GPT ecosystem.

Technical Notes for Practitioners

Why LoRA instead of full fine-tuning? The model is 8.4B parameters in bfloat16 (~16.8 GB). Full fine-tuning requires storing gradients for every parameter, pushing memory to ~33.6 GB -- exceeding even the RTX 5090's 32 GB. LoRA freezes the base and trains 19.4M adapter parameters (0.23%), fitting comfortably in 17.3 GB.
Why SDPA instead of flex_attention? Steerling's block-causal attention uses PyTorch's flex_attention with custom block masks. While efficient for inference, flex_attention's backward pass materializes large intermediate tensors that OOM on consumer GPUs. We force the SDPA fallback which uses Flash Attention kernels with a much more memory-efficient backward pass.
Why masked diffusion instead of next-token prediction? Steerling generates via iterative denoising, not left-to-right prediction. Training with next-token cross-entropy would be architecturally incompatible. The masked diffusion objective matches the model's native inference mechanism -- mask, predict, refine.
The peft task_type: null warning on HuggingFace is expected. Steerling's architecture doesn't map to any of peft's predefined task types (CAUSAL_LM, FEATURE_EXTRACTION, etc.). Setting task_type to any string value causes loading errors because peft wraps the model in a task-specific class that passes incompatible kwargs to Steerling's forward method.

Citation

If you use this adapter, please cite the base model:

@misc{steerling8b,
    title={Steerling-8B: An Interpretable Causal Diffusion Language Model},
    author={Guide Labs},
    year={2025},
    url={https://huggingface.co/guidelabs/steerling-8b}
}

License

This adapter inherits the license of the base model. See guidelabs/steerling-8b for details.

Downloads last month: 1

Model tree for cahlen/steerling-8b-combined-lora

Base model

guidelabs/steerling-8b

Adapter

(3)

this model

Datasets used to train cahlen/steerling-8b-combined-lora

Evaluation results

Training Loss (masked diffusion)
self-reported

4.452
Perplexity
self-reported

85.800