license: apache-2.0
library_name: steerling
tags:
- causal-diffusion
- interpretability
- concept-steering
- masked-diffusion
- block-causal
language:
- en
pipeline_tag: text-generation
Steerling-8B:
An interpretable causal diffusion language model with concept steering.
Steerling-8B is an 8 billion parameter language model that combines masked diffusion with interpretable concept decomposition. Unlike standard autoregressive LLMs, Steerling generates text by iteratively unmasking tokens in order of confidence, and decomposes its internal representations into human-interpretable concepts that can be inspected and steered.
Quick Start
pip install steerling
from steerling import SteerlingGenerator, GenerationConfig
generator = SteerlingGenerator.from_pretrained("guidelabs/steerling-8b")
text = generator.generate(
"The key to understanding neural networks is",
GenerationConfig(max_new_tokens=100, seed=42),
)
print(text)
Model Details
| Property | Value |
|---|---|
| Parameters | 8.4B |
| Architecture | CausalDiffusionLM + iGuide |
| Context Length | 4,096 |
| Vocabulary | 100,281 (cl100k_base + specials) |
| Known Concepts | 33,732 |
| Unknown Concepts | 101,196 |
| GQA | 32 heads, 4 KV heads |
| Diff Block Size | 64 |
| Precision | bfloat16 |
| VRAM Required | ~18GB |
Architecture
Steerling uses block-causal attention, bidirectional within a block, and causal across blocks. The interpretable concept heads decompose transformer hidden states into:
hidden → known_features + unknown_features + epsilon = composed → logits
- known_features: Weighted sum of top-k learned concept embeddings (interpretable and maps to understandable features)
- unknown_features: Residual captured by a factorized unknown head (101,196 concepts, rank 256)
- epsilon: Small correction for reconstruction fidelity
Training Data
| Dataset | License | Stage |
|---|---|---|
| Nemotron-CC-HQ (real + synthetic) | NVIDIA Data Agreement | Pretraining |
| Dolmino Mix (math) | ODC-By v1.0 | Midtraining |
The Nemotron-CC dataset includes synthetic data generated by third-party models (Qwen, DeepSeek). Users should review the applicable license terms for their intended use case.
GPU Requirements
| Setup | Works? |
|---|---|
| A100 80GB | ✅ |
| A100 40GB | ✅ |
| A6000 48GB | ✅ |
| RTX 4090 24GB | ✅ |
| RTX 3090 24GB | ✅ |
| 16GB or less | ❌ |
License
The Steerling source code and model weights are released under the Apache License 2.0.
The model weights are provided for research and evaluation purposes. The weights were trained on datasets with varying license terms, including Nemotron-CC-HQ and Dolmino Mix. Some training data includes synthetic content generated by third-party models with their own license terms. We are currently reviewing the implications of these upstream licenses for downstream use of the model weights. Please check back for updates on the weight licensing terms.
For questions about commercial use of the model weights, contact us at info@guidelabs.ai