Update README.md
Browse files
README.md
CHANGED
|
@@ -36,7 +36,7 @@ This isn't a constraint that fights the model. It's structure the model uses.
|
|
| 36 |
## Architecture
|
| 37 |
|
| 38 |
```
|
| 39 |
-
Input β Token Embedding (48K vocab,
|
| 40 |
β
|
| 41 |
βΌ
|
| 42 |
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
@@ -171,13 +171,32 @@ Loss variance halved across training (Ο: 0.291 β 0.142), indicating the mixtu
|
|
| 171 |
}
|
| 172 |
```
|
| 173 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 174 |
## Usage
|
| 175 |
|
| 176 |
```python
|
| 177 |
from transformers import AutoTokenizer
|
| 178 |
from MoA import MoAMetricLM, MoAMetricConfig
|
| 179 |
|
| 180 |
-
tokenizer = AutoTokenizer.from_pretrained("
|
| 181 |
model = MoAMetricLM.from_pretrained("reaperdoesntknow/DiscoverLM-70M")
|
| 182 |
|
| 183 |
inputs = tokenizer("The triangle inequality guarantees that", return_tensors="pt")
|
|
@@ -185,6 +204,28 @@ outputs = model.generate(**inputs, max_new_tokens=128)
|
|
| 185 |
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 186 |
```
|
| 187 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 188 |
## Mathematical Foundation
|
| 189 |
|
| 190 |
The metric attention mechanism is grounded in the Discrepancy Calculus (DISC), a measure-theoretic framework for singularity analysis developed by the author. The triangle inequality regularizer enforces that the learned attention geometry satisfies d(a,c) β€ d(a,b) + d(b,c) across sampled triples, ensuring the distance function used for attention scoring is a proper metric β not merely a similarity function.
|
|
@@ -211,10 +252,10 @@ This architecture derives from research in metric-native neural computation:
|
|
| 211 |
## Citation
|
| 212 |
|
| 213 |
```bibtex
|
| 214 |
-
@misc{
|
| 215 |
-
author = {
|
| 216 |
title = {DiscoverLM-70M: Metric-Attention Mixture of Attentions with Triangle Inequality Enforcement},
|
| 217 |
-
year = {
|
| 218 |
publisher = {HuggingFace},
|
| 219 |
url = {https://huggingface.co/reaperdoesntknow/DiscoverLM-70M}
|
| 220 |
}
|
|
@@ -222,5 +263,6 @@ This architecture derives from research in metric-native neural computation:
|
|
| 222 |
|
| 223 |
## Author
|
| 224 |
|
| 225 |
-
|
|
|
|
| 226 |
HuggingFace: [reaperdoesntknow](https://huggingface.co/reaperdoesntknow)
|
|
|
|
| 36 |
## Architecture
|
| 37 |
|
| 38 |
```
|
| 39 |
+
Input β Token Embedding (48K vocab, custom tokenizer)
|
| 40 |
β
|
| 41 |
βΌ
|
| 42 |
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
|
|
| 171 |
}
|
| 172 |
```
|
| 173 |
|
| 174 |
+
### Tokenizer
|
| 175 |
+
|
| 176 |
+
Custom 48K vocabulary tokenizer with structured generation tokens built in:
|
| 177 |
+
|
| 178 |
+
```json
|
| 179 |
+
{
|
| 180 |
+
"backend": "tokenizers",
|
| 181 |
+
"model_max_length": 2048,
|
| 182 |
+
"bos_token": "<|bos|>",
|
| 183 |
+
"eos_token": "<|eos|>",
|
| 184 |
+
"pad_token": "<|pad|>",
|
| 185 |
+
"unk_token": "<|unk|>",
|
| 186 |
+
"extra_special_tokens": [
|
| 187 |
+
"<|system|>", "<|user|>", "<|assistant|>",
|
| 188 |
+
"<|think|>", "<|reasoning|>"
|
| 189 |
+
]
|
| 190 |
+
}
|
| 191 |
+
```
|
| 192 |
+
|
| 193 |
## Usage
|
| 194 |
|
| 195 |
```python
|
| 196 |
from transformers import AutoTokenizer
|
| 197 |
from MoA import MoAMetricLM, MoAMetricConfig
|
| 198 |
|
| 199 |
+
tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/DiscoverLM-70M")
|
| 200 |
model = MoAMetricLM.from_pretrained("reaperdoesntknow/DiscoverLM-70M")
|
| 201 |
|
| 202 |
inputs = tokenizer("The triangle inequality guarantees that", return_tensors="pt")
|
|
|
|
| 204 |
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 205 |
```
|
| 206 |
|
| 207 |
+
### Chat Format
|
| 208 |
+
|
| 209 |
+
The tokenizer includes built-in special tokens for structured generation:
|
| 210 |
+
|
| 211 |
+
| Token | Role |
|
| 212 |
+
|---|---|
|
| 213 |
+
| `<\|system\|>` | System prompt boundary |
|
| 214 |
+
| `<\|user\|>` | User turn boundary |
|
| 215 |
+
| `<\|assistant\|>` | Assistant turn boundary |
|
| 216 |
+
| `<\|think\|>` | Internal reasoning start |
|
| 217 |
+
| `<\|reasoning\|>` | Reasoning chain marker |
|
| 218 |
+
| `<\|bos\|>` | Beginning of sequence |
|
| 219 |
+
| `<\|eos\|>` | End of sequence |
|
| 220 |
+
| `<\|pad\|>` | Padding |
|
| 221 |
+
|
| 222 |
+
```python
|
| 223 |
+
# Chat-style prompting
|
| 224 |
+
prompt = "<|system|>You are DiscoverLM, a small language model with metric attention.<|user|>What is the triangle inequality?<|assistant|><|think|><|reasoning|>"
|
| 225 |
+
inputs = tokenizer(prompt, return_tensors="pt")
|
| 226 |
+
outputs = model.generate(**inputs, max_new_tokens=256)
|
| 227 |
+
```
|
| 228 |
+
|
| 229 |
## Mathematical Foundation
|
| 230 |
|
| 231 |
The metric attention mechanism is grounded in the Discrepancy Calculus (DISC), a measure-theoretic framework for singularity analysis developed by the author. The triangle inequality regularizer enforces that the learned attention geometry satisfies d(a,c) β€ d(a,b) + d(b,c) across sampled triples, ensuring the distance function used for attention scoring is a proper metric β not merely a similarity function.
|
|
|
|
| 252 |
## Citation
|
| 253 |
|
| 254 |
```bibtex
|
| 255 |
+
@misc{CILLC2026discoverLM,
|
| 256 |
+
author = {Convergent Intelligence LLC: Research Division},
|
| 257 |
title = {DiscoverLM-70M: Metric-Attention Mixture of Attentions with Triangle Inequality Enforcement},
|
| 258 |
+
year = {2026},
|
| 259 |
publisher = {HuggingFace},
|
| 260 |
url = {https://huggingface.co/reaperdoesntknow/DiscoverLM-70M}
|
| 261 |
}
|
|
|
|
| 263 |
|
| 264 |
## Author
|
| 265 |
|
| 266 |
+
Roy Colca Jr. β [Convergent Intelligence LLC](https://convergentintel.com)
|
| 267 |
+
|
| 268 |
HuggingFace: [reaperdoesntknow](https://huggingface.co/reaperdoesntknow)
|