juliusadebayo commited on
Commit
5359feb
·
verified ·
1 Parent(s): f1ff646

Add model card

Browse files
Files changed (1) hide show
  1. README.md +110 -0
README.md ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: steerling
4
+ tags:
5
+ - causal-diffusion
6
+ - interpretability
7
+ - concept-steering
8
+ - masked-diffusion
9
+ - block-causal
10
+ language:
11
+ - en
12
+ pipeline_tag: text-generation
13
+ ---
14
+
15
+ # Steerling-8B:
16
+
17
+ **An interpretable causal diffusion language model with concept steering.**
18
+
19
+ Steerling-8B is an 8 billion parameter language model that combines masked diffusion with interpretable concept decomposition. Unlike standard autoregressive LLMs, Steerling generates text by iteratively unmasking tokens in order of confidence, and decomposes its internal representations into human-interpretable concepts that can be inspected and steered.
20
+
21
+ ## Quick Start
22
+
23
+ ```bash
24
+ pip install steerling
25
+ ```
26
+
27
+ ```python
28
+ from steerling import SteerlingGenerator, GenerationConfig
29
+
30
+ generator = SteerlingGenerator.from_pretrained("guidelabs/steerling-8b")
31
+
32
+ text = generator.generate(
33
+ "The key to understanding neural networks is",
34
+ GenerationConfig(max_new_tokens=100, seed=42),
35
+ )
36
+ print(text)
37
+ ```
38
+
39
+ Concept IDs and descriptions are available in `concepts/complete_concept_info.csv`.
40
+
41
+ ## Concept Attribution
42
+
43
+ Inspect which concepts contribute to model predictions:
44
+
45
+ ```python
46
+ import torch
47
+
48
+ input_ids = torch.tensor(
49
+ [generator.tokenizer.encode("Machine learning predicts protein structures")],
50
+ device=generator.device,
51
+ )
52
+
53
+ logits, outputs = generator.model(input_ids, use_teacher_forcing=False, minimal_output=False)
54
+ print(f"Top-k known concepts: {outputs.known_topk_indices.shape}")
55
+ print(f"Known features norm: {outputs.known_features.norm(dim=-1).mean():.3f}")
56
+ ```
57
+
58
+ ## Model Details
59
+
60
+ | Property | Value |
61
+ |---|---|
62
+ | Parameters | 8.4B |
63
+ | Architecture | CausalDiffusionLM + iGuide |
64
+ | Context Length | 4,096 |
65
+ | Vocabulary | 100,281 (cl100k_base + specials) |
66
+ | Known Concepts | 33,732 |
67
+ | Unknown Concepts | 101,196 |
68
+ | GQA | 32 heads, 4 KV heads |
69
+ | Diff Block Size | 64 |
70
+ | Precision | bfloat16 |
71
+ | VRAM Required | ~18GB |
72
+
73
+ ## Architecture
74
+
75
+ Steerling uses block-causal attention, bidirectional within a block, and causal across blocks. The interpretable concept heads decompose transformer hidden states into:
76
+
77
+ ```
78
+ hidden → known_features + unknown_features + epsilon = composed → logits
79
+ ```
80
+
81
+ - **known_features**: Weighted sum of top-k learned concept embeddings (interpretable and maps to understandable features)
82
+ - **unknown_features**: Residual captured by a factorized unknown head (101,196 concepts, rank 256)
83
+ - **epsilon**: Small correction for reconstruction fidelity
84
+
85
+ ## Training Data
86
+
87
+ | Dataset | License | Stage |
88
+ |---|---|---|
89
+ | [Nemotron-CC-HQ](https://huggingface.co/datasets/nvidia/Nemotron-CC) (real + synthetic) | NVIDIA Data Agreement | Pretraining |
90
+ | [Dolmino Mix](https://huggingface.co/datasets/allenai/dolmino-mix-1124) (math) | ODC-By v1.0 | Midtraining |
91
+
92
+ The Nemotron-CC dataset includes synthetic data generated by third-party models (Qwen, DeepSeek). Users should review the applicable license terms for their intended use case.
93
+
94
+ ## GPU Requirements
95
+
96
+ | Setup | Works? |
97
+ |---|---|
98
+ | A100 80GB | ✅ |
99
+ | A100 40GB | ✅ |
100
+ | A6000 48GB | ✅ |
101
+ | RTX 4090 24GB | ✅ |
102
+ | RTX 3090 24GB | ✅ |
103
+ | 16GB or less | ❌ |
104
+
105
+
106
+ ## License
107
+
108
+ The Steerling source code and model weights are released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
109
+
110
+ See [Training Data](#training-data) for upstream dataset license information.