--- license: cc-by-nc-4.0 language: - en - fr - code tags: - complexity - token-routed-mlp - flash-attention - causal-lm library_name: transformers pipeline_tag: text-generation --- # Complexity Base A Llama-style transformer with architectural improvements for efficiency and performance. ## Architecture: Llama + Improvements Complexity builds on the Llama architecture with three key enhancements: | Component | Llama | Complexity | |-----------|-------|------------| | **MLP** | Dense FFN | **Token-Routed MLP** (4 experts, 1 active) | | **Attention** | Standard | **Flash Attention** via SDPA | | **Normalization** | RMSNorm only | RMSNorm + **QK Normalization** | ### Token-Routed MLP Unlike MoE which routes based on hidden states, Token-Routed MLP routes based on **token ID**: ```python expert_idx = token_id % num_experts # Deterministic routing output = experts[expert_idx](hidden_states) ``` **Benefits:** - No router network overhead - Deterministic, reproducible routing - 4x parameter efficiency (only 1/4 experts active) ### QK Normalization Stabilizes attention at scale by normalizing Q and K before computing attention scores: ```python q = self.q_norm(q) k = self.k_norm(k) attn = (q @ k.T) / sqrt(d) ``` ## Model Details - **Parameters**: ~100M - **Hidden size**: 768 - **Layers**: 12 - **Attention heads**: 12 (KV heads: 4) - **Experts**: 4 (1 active per token) - **Vocabulary**: 100K tokens - **Context**: 2048 tokens - **Training steps**: 10,000 ## Installation ```bash pip install complexity-model pyllm-inference ``` ## Usage ### With PyLLM ```bash pyllm serve Pacific-Prime/complexity-tiny ``` ### Python API ```python from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Pacific-Prime/complexity") model = AutoModelForCausalLM.from_pretrained( "Pacific-Prime/complexity", trust_remote_code=True ) inputs = tokenizer("def fibonacci(n):", return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=100) print(tokenizer.decode(outputs[0])) ``` ## Comparison with Llama ``` Llama: embed -> [Attn + FFN] x L -> output Complexity: embed -> [Attn + TokenRoutedMLP] x L -> output ↑ QK Norm ↑ 4 experts (1 active) ``` Same parameter count, but: - **4x more total MLP parameters** (distributed across experts) - **Faster training** (QK norm stabilizes gradients) - **Better scaling** (sparse activation) ## License Apache 2.0 ## Links - [GitHub](https://github.com/Complexity-ML/complexity-framework) - [PyPI](https://pypi.org/project/complexity-framework/) ## Citation ```bibtex @misc{complexity, title={Complexity: Token-Routed MLP Transformer}, author={Pacific Prime}, year={2025}, url={https://huggingface.co/Pacific-Prime/complexity} } ```