File size: 2,809 Bytes
c8a112a 95b7c5f c8a112a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
---
license: cc-by-nc-4.0
language:
- en
- fr
- code
tags:
- complexity
- token-routed-mlp
- flash-attention
- causal-lm
library_name: transformers
pipeline_tag: text-generation
---
# Complexity Base
A Llama-style transformer with architectural improvements for efficiency and performance.
## Architecture: Llama + Improvements
Complexity builds on the Llama architecture with three key enhancements:
| Component | Llama | Complexity |
|-----------|-------|------------|
| **MLP** | Dense FFN | **Token-Routed MLP** (4 experts, 1 active) |
| **Attention** | Standard | **Flash Attention** via SDPA |
| **Normalization** | RMSNorm only | RMSNorm + **QK Normalization** |
### Token-Routed MLP
Unlike MoE which routes based on hidden states, Token-Routed MLP routes based on **token ID**:
```python
expert_idx = token_id % num_experts # Deterministic routing
output = experts[expert_idx](hidden_states)
```
**Benefits:**
- No router network overhead
- Deterministic, reproducible routing
- 4x parameter efficiency (only 1/4 experts active)
### QK Normalization
Stabilizes attention at scale by normalizing Q and K before computing attention scores:
```python
q = self.q_norm(q)
k = self.k_norm(k)
attn = (q @ k.T) / sqrt(d)
```
## Model Details
- **Parameters**: ~100M
- **Hidden size**: 768
- **Layers**: 12
- **Attention heads**: 12 (KV heads: 4)
- **Experts**: 4 (1 active per token)
- **Vocabulary**: 100K tokens
- **Context**: 2048 tokens
- **Training steps**: 10,000
## Installation
```bash
pip install complexity-model pyllm-inference
```
## Usage
### With PyLLM
```bash
pyllm serve Pacific-Prime/complexity-tiny
```
### Python API
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Pacific-Prime/complexity")
model = AutoModelForCausalLM.from_pretrained(
"Pacific-Prime/complexity",
trust_remote_code=True
)
inputs = tokenizer("def fibonacci(n):", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
```
## Comparison with Llama
```
Llama: embed -> [Attn + FFN] x L -> output
Complexity: embed -> [Attn + TokenRoutedMLP] x L -> output
↑ QK Norm ↑ 4 experts (1 active)
```
Same parameter count, but:
- **4x more total MLP parameters** (distributed across experts)
- **Faster training** (QK norm stabilizes gradients)
- **Better scaling** (sparse activation)
## License
Apache 2.0
## Links
- [GitHub](https://github.com/Complexity-ML/complexity-framework)
- [PyPI](https://pypi.org/project/complexity-framework/)
## Citation
```bibtex
@misc{complexity,
title={Complexity: Token-Routed MLP Transformer},
author={Pacific Prime},
year={2025},
url={https://huggingface.co/Pacific-Prime/complexity}
}
```
|