Steerling-8B:

An interpretable causal diffusion language model with concept steering.

Steerling-8B is an 8 billion parameter language model that combines masked diffusion with interpretable concept decomposition. Unlike standard autoregressive LLMs, Steerling generates text by iteratively unmasking tokens in order of confidence, and decomposes its internal representations into human-interpretable concepts that can be inspected and steered.

Quick Start

pip install steerling
from steerling import SteerlingGenerator, GenerationConfig

generator = SteerlingGenerator.from_pretrained("guidelabs/steerling-8b")

text = generator.generate(
    "The key to understanding neural networks is",
    GenerationConfig(max_new_tokens=100, seed=42),
)
print(text)

Model Details

Property Value
Parameters 8.4B
Architecture CausalDiffusionLM + iGuide
Context Length 4,096
Vocabulary 100,281 (cl100k_base + specials)
Known Concepts 33,732
Unknown Concepts 101,196
GQA 32 heads, 4 KV heads
Diff Block Size 64
Precision bfloat16
VRAM Required ~18GB

Architecture

Steerling uses block-causal attention, bidirectional within a block, and causal across blocks. The interpretable concept heads decompose transformer hidden states into:

hidden β†’ known_features + unknown_features + epsilon = composed β†’ logits
  • known_features: Weighted sum of top-k learned concept embeddings (interpretable and maps to understandable features)
  • unknown_features: Residual captured by a factorized unknown head (101,196 concepts, rank 256)
  • epsilon: Small correction for reconstruction fidelity

Training Data

Dataset License Stage
Nemotron-CC-HQ (real + synthetic) NVIDIA Data Agreement Pretraining
Dolmino Mix (math) ODC-By v1.0 Midtraining

The Nemotron-CC dataset includes synthetic data generated by third-party models (Qwen, DeepSeek). Users should review the applicable license terms for their intended use case.

GPU Requirements

Setup Works?
A100 80GB βœ…
A100 40GB βœ…
A6000 48GB βœ…
RTX 4090 24GB βœ…
RTX 3090 24GB βœ…
16GB or less ❌

License

The Steerling source code and model weights are released under the Apache License 2.0.

The model weights are provided for research and evaluation purposes. The weights were trained on datasets with varying license terms, including Nemotron-CC-HQ and Dolmino Mix. Some training data includes synthetic content generated by third-party models with their own license terms. We are currently reviewing the implications of these upstream licenses for downstream use of the model weights. Please check back for updates on the weight licensing terms.

For questions about commercial use of the model weights, contact us at info@guidelabs.ai

Downloads last month
182
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ 1 Ask for provider support