reaperdoesntknow commited on
Commit
b6d5c2d
Β·
verified Β·
1 Parent(s): ab29104

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -6
README.md CHANGED
@@ -36,7 +36,7 @@ This isn't a constraint that fights the model. It's structure the model uses.
36
  ## Architecture
37
 
38
  ```
39
- Input β†’ Token Embedding (48K vocab, Qwen3)
40
  β”‚
41
  β–Ό
42
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
@@ -171,13 +171,32 @@ Loss variance halved across training (Οƒ: 0.291 β†’ 0.142), indicating the mixtu
171
  }
172
  ```
173
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
174
  ## Usage
175
 
176
  ```python
177
  from transformers import AutoTokenizer
178
  from MoA import MoAMetricLM, MoAMetricConfig
179
 
180
- tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")
181
  model = MoAMetricLM.from_pretrained("reaperdoesntknow/DiscoverLM-70M")
182
 
183
  inputs = tokenizer("The triangle inequality guarantees that", return_tensors="pt")
@@ -185,6 +204,28 @@ outputs = model.generate(**inputs, max_new_tokens=128)
185
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
186
  ```
187
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
188
  ## Mathematical Foundation
189
 
190
  The metric attention mechanism is grounded in the Discrepancy Calculus (DISC), a measure-theoretic framework for singularity analysis developed by the author. The triangle inequality regularizer enforces that the learned attention geometry satisfies d(a,c) ≀ d(a,b) + d(b,c) across sampled triples, ensuring the distance function used for attention scoring is a proper metric β€” not merely a similarity function.
@@ -211,10 +252,10 @@ This architecture derives from research in metric-native neural computation:
211
  ## Citation
212
 
213
  ```bibtex
214
- @misc{colca2025discoverLM,
215
- author = {Colca, Roy},
216
  title = {DiscoverLM-70M: Metric-Attention Mixture of Attentions with Triangle Inequality Enforcement},
217
- year = {2025},
218
  publisher = {HuggingFace},
219
  url = {https://huggingface.co/reaperdoesntknow/DiscoverLM-70M}
220
  }
@@ -222,5 +263,6 @@ This architecture derives from research in metric-native neural computation:
222
 
223
  ## Author
224
 
225
- Convergent Intelligence LLC: Research Division β€” [Convergent Intelligence LLC](https://convergentintel.com)
 
226
  HuggingFace: [reaperdoesntknow](https://huggingface.co/reaperdoesntknow)
 
36
  ## Architecture
37
 
38
  ```
39
+ Input β†’ Token Embedding (48K vocab, custom tokenizer)
40
  β”‚
41
  β–Ό
42
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 
171
  }
172
  ```
173
 
174
+ ### Tokenizer
175
+
176
+ Custom 48K vocabulary tokenizer with structured generation tokens built in:
177
+
178
+ ```json
179
+ {
180
+ "backend": "tokenizers",
181
+ "model_max_length": 2048,
182
+ "bos_token": "<|bos|>",
183
+ "eos_token": "<|eos|>",
184
+ "pad_token": "<|pad|>",
185
+ "unk_token": "<|unk|>",
186
+ "extra_special_tokens": [
187
+ "<|system|>", "<|user|>", "<|assistant|>",
188
+ "<|think|>", "<|reasoning|>"
189
+ ]
190
+ }
191
+ ```
192
+
193
  ## Usage
194
 
195
  ```python
196
  from transformers import AutoTokenizer
197
  from MoA import MoAMetricLM, MoAMetricConfig
198
 
199
+ tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/DiscoverLM-70M")
200
  model = MoAMetricLM.from_pretrained("reaperdoesntknow/DiscoverLM-70M")
201
 
202
  inputs = tokenizer("The triangle inequality guarantees that", return_tensors="pt")
 
204
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
205
  ```
206
 
207
+ ### Chat Format
208
+
209
+ The tokenizer includes built-in special tokens for structured generation:
210
+
211
+ | Token | Role |
212
+ |---|---|
213
+ | `<\|system\|>` | System prompt boundary |
214
+ | `<\|user\|>` | User turn boundary |
215
+ | `<\|assistant\|>` | Assistant turn boundary |
216
+ | `<\|think\|>` | Internal reasoning start |
217
+ | `<\|reasoning\|>` | Reasoning chain marker |
218
+ | `<\|bos\|>` | Beginning of sequence |
219
+ | `<\|eos\|>` | End of sequence |
220
+ | `<\|pad\|>` | Padding |
221
+
222
+ ```python
223
+ # Chat-style prompting
224
+ prompt = "<|system|>You are DiscoverLM, a small language model with metric attention.<|user|>What is the triangle inequality?<|assistant|><|think|><|reasoning|>"
225
+ inputs = tokenizer(prompt, return_tensors="pt")
226
+ outputs = model.generate(**inputs, max_new_tokens=256)
227
+ ```
228
+
229
  ## Mathematical Foundation
230
 
231
  The metric attention mechanism is grounded in the Discrepancy Calculus (DISC), a measure-theoretic framework for singularity analysis developed by the author. The triangle inequality regularizer enforces that the learned attention geometry satisfies d(a,c) ≀ d(a,b) + d(b,c) across sampled triples, ensuring the distance function used for attention scoring is a proper metric β€” not merely a similarity function.
 
252
  ## Citation
253
 
254
  ```bibtex
255
+ @misc{CILLC2026discoverLM,
256
+ author = {Convergent Intelligence LLC: Research Division},
257
  title = {DiscoverLM-70M: Metric-Attention Mixture of Attentions with Triangle Inequality Enforcement},
258
+ year = {2026},
259
  publisher = {HuggingFace},
260
  url = {https://huggingface.co/reaperdoesntknow/DiscoverLM-70M}
261
  }
 
263
 
264
  ## Author
265
 
266
+ Roy Colca Jr. β€” [Convergent Intelligence LLC](https://convergentintel.com)
267
+
268
  HuggingFace: [reaperdoesntknow](https://huggingface.co/reaperdoesntknow)