guidelabs
/

steerling-8b

+---
+license: apache-2.0
+library_name: steerling
+tags:
+  - causal-diffusion
+  - interpretability
+  - concept-steering
+  - masked-diffusion
+  - block-causal
+language:
+  - en
+pipeline_tag: text-generation
+---
+# Steerling-8B:
+**An interpretable causal diffusion language model with concept steering.**
+Steerling-8B is an 8 billion parameter language model that combines masked diffusion with interpretable concept decomposition. Unlike standard autoregressive LLMs, Steerling generates text by iteratively unmasking tokens in order of confidence, and decomposes its internal representations into human-interpretable concepts that can be inspected and steered.
+## Quick Start
+```bash
+pip install steerling
+```
+```python
+from steerling import SteerlingGenerator, GenerationConfig
+generator = SteerlingGenerator.from_pretrained("guidelabs/steerling-8b")
+text = generator.generate(
+    "The key to understanding neural networks is",
+    GenerationConfig(max_new_tokens=100, seed=42),
+)
+print(text)
+```
+Concept IDs and descriptions are available in `concepts/complete_concept_info.csv`.
+## Concept Attribution
+Inspect which concepts contribute to model predictions:
+```python
+import torch
+input_ids = torch.tensor(
+    [generator.tokenizer.encode("Machine learning predicts protein structures")],
+    device=generator.device,
+)
+logits, outputs = generator.model(input_ids, use_teacher_forcing=False, minimal_output=False)
+print(f"Top-k known concepts: {outputs.known_topk_indices.shape}")
+print(f"Known features norm: {outputs.known_features.norm(dim=-1).mean():.3f}")
+```
+## Model Details
+| Property | Value |
+|---|---|
+| Parameters | 8.4B |
+| Architecture | CausalDiffusionLM + iGuide |
+| Context Length | 4,096 |
+| Vocabulary | 100,281 (cl100k_base + specials) |
+| Known Concepts | 33,732 |
+| Unknown Concepts | 101,196 |
+| GQA | 32 heads, 4 KV heads |
+| Diff Block Size | 64 |
+| Precision | bfloat16 |
+| VRAM Required | ~18GB |
+## Architecture
+Steerling uses block-causal attention, bidirectional within a block, and causal across blocks. The interpretable concept heads decompose transformer hidden states into:
+```
+hidden → known_features + unknown_features + epsilon = composed → logits
+```
+- **known_features**: Weighted sum of top-k learned concept embeddings (interpretable and maps to understandable features)
+- **unknown_features**: Residual captured by a factorized unknown head (101,196 concepts, rank 256)
+- **epsilon**: Small correction for reconstruction fidelity
+## Training Data
+| Dataset | License | Stage |
+|---|---|---|
+| [Nemotron-CC-HQ](https://huggingface.co/datasets/nvidia/Nemotron-CC) (real + synthetic) | NVIDIA Data Agreement | Pretraining |
+| [Dolmino Mix](https://huggingface.co/datasets/allenai/dolmino-mix-1124) (math) | ODC-By v1.0 | Midtraining |
+The Nemotron-CC dataset includes synthetic data generated by third-party models (Qwen, DeepSeek). Users should review the applicable license terms for their intended use case.
+## GPU Requirements
+| Setup | Works? |
+|---|---|
+| A100 80GB | ✅ |
+| A100 40GB | ✅ |
+| A6000 48GB | ✅ |
+| RTX 4090 24GB | ✅ |
+| RTX 3090 24GB | ✅ |
+| 16GB or less | ❌ |
+## License
+The Steerling source code and model weights are released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
+See [Training Data](#training-data) for upstream dataset license information.