Update README.md

Browse files

Files changed (1) hide show

README.md +48 -6

README.md CHANGED Viewed

@@ -36,7 +36,7 @@ This isn't a constraint that fights the model. It's structure the model uses.
 ## Architecture
 ```
-Input → Token Embedding (48K vocab, Qwen3)
   │
   ▼
 ┌──────────────────────────────────────────────────┐
@@ -171,13 +171,32 @@ Loss variance halved across training (σ: 0.291 → 0.142), indicating the mixtu
 }
 ```
 ## Usage
 ```python
 from transformers import AutoTokenizer
 from MoA import MoAMetricLM, MoAMetricConfig
-tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")
 model = MoAMetricLM.from_pretrained("reaperdoesntknow/DiscoverLM-70M")
 inputs = tokenizer("The triangle inequality guarantees that", return_tensors="pt")
@@ -185,6 +204,28 @@ outputs = model.generate(**inputs, max_new_tokens=128)
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
 ## Mathematical Foundation
 The metric attention mechanism is grounded in the Discrepancy Calculus (DISC), a measure-theoretic framework for singularity analysis developed by the author. The triangle inequality regularizer enforces that the learned attention geometry satisfies d(a,c) ≤ d(a,b) + d(b,c) across sampled triples, ensuring the distance function used for attention scoring is a proper metric — not merely a similarity function.
@@ -211,10 +252,10 @@ This architecture derives from research in metric-native neural computation:
 ## Citation
 ```bibtex
-@misc{colca2025discoverLM,
-  author = {Colca, Roy},
   title = {DiscoverLM-70M: Metric-Attention Mixture of Attentions with Triangle Inequality Enforcement},
-  year = {2025},
   publisher = {HuggingFace},
   url = {https://huggingface.co/reaperdoesntknow/DiscoverLM-70M}
 }
@@ -222,5 +263,6 @@ This architecture derives from research in metric-native neural computation:
 ## Author
-Convergent Intelligence LLC: Research Division — [Convergent Intelligence LLC](https://convergentintel.com)
 HuggingFace: [reaperdoesntknow](https://huggingface.co/reaperdoesntknow)

 ## Architecture
 ```
+Input → Token Embedding (48K vocab, custom tokenizer)
   │
   ▼
 ┌──────────────────────────────────────────────────┐
 }
 ```
+### Tokenizer
+Custom 48K vocabulary tokenizer with structured generation tokens built in:
+```json
+{
+  "backend": "tokenizers",
+  "model_max_length": 2048,
+  "bos_token": "<|bos|>",
+  "eos_token": "<|eos|>",
+  "pad_token": "<|pad|>",
+  "unk_token": "<|unk|>",
+  "extra_special_tokens": [
+    "<|system|>", "<|user|>", "<|assistant|>",
+    "<|think|>", "<|reasoning|>"
+  ]
+}
+```
 ## Usage
 ```python
 from transformers import AutoTokenizer
 from MoA import MoAMetricLM, MoAMetricConfig
+tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/DiscoverLM-70M")
 model = MoAMetricLM.from_pretrained("reaperdoesntknow/DiscoverLM-70M")
 inputs = tokenizer("The triangle inequality guarantees that", return_tensors="pt")
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
+### Chat Format
+The tokenizer includes built-in special tokens for structured generation:
+| Token | Role |
+|---|---|
+| `<\|system\|>` | System prompt boundary |
+| `<\|user\|>` | User turn boundary |
+| `<\|assistant\|>` | Assistant turn boundary |
+| `<\|think\|>` | Internal reasoning start |
+| `<\|reasoning\|>` | Reasoning chain marker |
+| `<\|bos\|>` | Beginning of sequence |
+| `<\|eos\|>` | End of sequence |
+| `<\|pad\|>` | Padding |
+```python
+# Chat-style prompting
+prompt = "<|system|>You are DiscoverLM, a small language model with metric attention.<|user|>What is the triangle inequality?<|assistant|><|think|><|reasoning|>"
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=256)
+```
 ## Mathematical Foundation
 The metric attention mechanism is grounded in the Discrepancy Calculus (DISC), a measure-theoretic framework for singularity analysis developed by the author. The triangle inequality regularizer enforces that the learned attention geometry satisfies d(a,c) ≤ d(a,b) + d(b,c) across sampled triples, ensuring the distance function used for attention scoring is a proper metric — not merely a similarity function.
 ## Citation
 ```bibtex
+@misc{CILLC2026discoverLM,
+  author = {Convergent Intelligence LLC: Research Division},
   title = {DiscoverLM-70M: Metric-Attention Mixture of Attentions with Triangle Inequality Enforcement},
+  year = {2026},
   publisher = {HuggingFace},
   url = {https://huggingface.co/reaperdoesntknow/DiscoverLM-70M}
 }
 ## Author
+Roy Colca Jr. — [Convergent Intelligence LLC](https://convergentintel.com)
 HuggingFace: [reaperdoesntknow](https://huggingface.co/reaperdoesntknow)