File size: 2,809 Bytes

---
license: cc-by-nc-4.0
language:
- en
- fr
- code
tags:
- complexity
- token-routed-mlp
- flash-attention
- causal-lm
library_name: transformers
pipeline_tag: text-generation
---

# Complexity Base

A Llama-style transformer with architectural improvements for efficiency and performance.

## Architecture: Llama + Improvements

Complexity builds on the Llama architecture with three key enhancements:

| Component | Llama | Complexity |
|-----------|-------|------------|
| **MLP** | Dense FFN | **Token-Routed MLP** (4 experts, 1 active) |
| **Attention** | Standard | **Flash Attention** via SDPA |
| **Normalization** | RMSNorm only | RMSNorm + **QK Normalization** |

### Token-Routed MLP

Unlike MoE which routes based on hidden states, Token-Routed MLP routes based on **token ID**:

```python
expert_idx = token_id % num_experts  # Deterministic routing
output = experts[expert_idx](hidden_states)
```

**Benefits:**
- No router network overhead
- Deterministic, reproducible routing
- 4x parameter efficiency (only 1/4 experts active)

### QK Normalization

Stabilizes attention at scale by normalizing Q and K before computing attention scores:

```python
q = self.q_norm(q)
k = self.k_norm(k)
attn = (q @ k.T) / sqrt(d)
```

## Model Details

- **Parameters**: ~100M
- **Hidden size**: 768
- **Layers**: 12
- **Attention heads**: 12 (KV heads: 4)
- **Experts**: 4 (1 active per token)
- **Vocabulary**: 100K tokens
- **Context**: 2048 tokens
- **Training steps**: 10,000

## Installation

```bash
pip install complexity-model pyllm-inference
```

## Usage

### With PyLLM

```bash
pyllm serve Pacific-Prime/complexity-tiny
```

### Python API

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Pacific-Prime/complexity")
model = AutoModelForCausalLM.from_pretrained(
    "Pacific-Prime/complexity",
    trust_remote_code=True
)

inputs = tokenizer("def fibonacci(n):", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
```

## Comparison with Llama

```
Llama:      embed -> [Attn + FFN] x L -> output
Complexity: embed -> [Attn + TokenRoutedMLP] x L -> output
                      ↑ QK Norm    ↑ 4 experts (1 active)
```

Same parameter count, but:
- **4x more total MLP parameters** (distributed across experts)
- **Faster training** (QK norm stabilizes gradients)
- **Better scaling** (sparse activation)

## License

Apache 2.0

## Links

- [GitHub](https://github.com/Complexity-ML/complexity-framework)
- [PyPI](https://pypi.org/project/complexity-framework/)

## Citation

```bibtex
@misc{complexity,
  title={Complexity: Token-Routed MLP Transformer},
  author={Pacific Prime},
  year={2025},
  url={https://huggingface.co/Pacific-Prime/complexity}
}
```