Spaces:

Complexity-Deep
/

README

Running

App Files Files Community

Pacific-Prime commited on 30 days ago

Commit

ce05689

verified ·

1 Parent(s): 7111114

Update README.md

Browse files

Files changed (1) hide show

README.md +101 -107

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: Complexity Deep
 emoji: 🐢
 colorFrom: purple
 colorTo: blue
@@ -9,171 +9,165 @@ thumbnail: >-
   https://cdn-uploads.huggingface.co/production/uploads/643222d9f76c34519e96a299/8j1GHX24MV3-sv-4zl7ZB.png
 ---
-# Complexity Deep
-**Next-generation LLM architecture with INL Dynamics and Token-Routed MLP**
-## What is Complexity Deep?
-Complexity Deep is a novel transformer architecture designed for **stability** and **efficiency**. It combines:
-- **INL Dynamics** - Robotics-grade control system for training stability
-- **Token-Routed MLP** - Deterministic MoE without routing overhead
-- **GQA (Grouped Query Attention)** - 4x faster inference, 4x smaller KV cache
-- **QK Norm** - Attention stability for deep models
 ## Key Innovation: INL Dynamics
-INL (Inertial Navigation Layer) Dynamics brings robotics control theory to LLM training:
-```
-Standard Transformer:  hidden → LayerNorm → Attention → MLP → output
-                       (can diverge on bad data)
-Complexity Deep:       hidden → INL Controller → Attention → MLP → output
-                       (self-stabilizing, recovers from spikes)
 ```
-**Real-world proof**: Our 150M model survived a loss spike of **4000x** and auto-recovered in 45 minutes without any intervention.
-## Token-Routed MLP
-Unlike learned MoE (Mixtral, etc.), Token-Routed MLP routes by token ID:
-| Aspect | Learned MoE | Token-Routed (Ours) |
-|--------|-------------|---------------------|
-| Routing | Neural network | `token_id % num_experts` |
-| Latency | 5-10ms | **<0.1ms** |
-| Deterministic | No | **Yes** |
-| Load balancing needed | Yes | **No** |
-**Why it works**: BPE tokenizers sort by frequency. Token ID = frequency category = natural expert specialization.
-## Models
-| Model | Params | Status | Link |
-|-------|--------|--------|------|
-| pacific-prime | 150M | Training (120K+ steps) | [HuggingFace](https://huggingface.co/Pacific-Prime/pacific-prime) |
-| complexity-tiny | 150M | Available | [HuggingFace](https://huggingface.co/Pacific-Prime/complexity-tiny) |
-## Installation
 ```bash
-pip install complexity-deep
 ```
-## Quick Start
 ```python
-from complexity_deep import DeepConfig, DeepForCausalLM, create_deep_model
-# Create a model
-model = create_deep_model(size="tiny", vocab_size=100000)
-# Or use presets
-config = DeepConfig.complexity_150m()  # 150M params
-config = DeepConfig.complexity_3_8b()  # 3.8B params
-config = DeepConfig.complexity_7b()    # 7B params
 ```
-## Architecture Comparison
-| Feature | LLaMA | Mistral | Complexity Deep |
-|---------|-------|---------|-----------------|
-| Attention | GQA | GQA + Sliding | GQA + QK Norm |
-| MLP | Dense | MoE (learned) | Token-Routed MoE |
-| Stability | Gradient clip | Gradient clip | **INL Dynamics** |
-| Recovery from spike | Manual rollback | Manual rollback | **Auto-recovery** |
-## Training Stability Demo
-**Real training run - Loss spike of 4000x with auto-recovery:**
-![INL Dynamics Recovery](https://cdn-uploads.huggingface.co/production/uploads/643222d9f76c34519e96a299/8j1GHX24MV3-sv-4zl7ZB.png)
-```
-Loss during training with bad batch:
-Standard:     5.6 → 4000 → NaN → DEAD
-Complexity:   5.6 → 4000 → 46 → 16 → 8 → 5.6 (auto-recovered!)
 ```
-The spike visible in the graph shows INL Dynamics absorbing a corrupted batch from FineWeb-Edu and automatically recovering without any manual intervention.
-## Available Configurations
 ```python
-# Small models (for testing)
-DeepConfig.complexity_tiny()   # ~15M
-DeepConfig.complexity_20m()    # ~20M
-DeepConfig.complexity_small()  # ~50M
-# Medium models
-DeepConfig.complexity_150m()   # ~150M (default)
-DeepConfig.complexity_base()   # ~125M
-DeepConfig.complexity_medium() # ~350M
-# Large models
-DeepConfig.complexity_1b()     # ~1B
-DeepConfig.complexity_3b()     # ~3B
-DeepConfig.complexity_3_8b()   # ~3.8B
-DeepConfig.complexity_7b()     # ~7B
-```
-## INL Dynamics Parameters
-```python
-config = DeepConfig(
-    dynamics_alpha=0.9,    # Inertia (momentum)
-    dynamics_beta=0.1,     # Correction strength
-    dynamics_gate=0.5,     # Amplitude control
-    dynamics_dt=0.1,       # Integration timestep
-)
 ```
-## Use Cases
-### 1. Training on Noisy Data
-INL Dynamics absorbs bad batches without killing your training run.
-### 2. Budget-Constrained Training
-No need for expensive rollbacks - the model self-heals.
-### 3. Robotics Applications
-Deterministic Token-Routed MLP = predictable, certifiable behavior.
-### 4. Edge Deployment
-GQA + Token-Routed = fast inference with small KV cache.
-## Research
-Complexity Deep introduces two novel concepts:
-1. **INL Dynamics**: First application of robotics control theory (PID-like) to transformer hidden states for training stability.
-2. **Deterministic Token-Routed MoE**: First MoE that routes by token ID instead of learned routing, leveraging BPE frequency ordering.
 ## Links
-- [PyPI Package](https://pypi.org/project/complexity-deep/)
-- [GitHub](https://github.com/Web3-League/complexity-deep)
-- [Pacific-Prime Organization](https://huggingface.co/Pacific-Prime)
 ## License
-CC-BY-4.0
 ## Citation
 ```bibtex
-@software{complexity_deep_2024,
-  title={Complexity Deep: INL Dynamics and Token-Routed MLP for Stable LLM Training},
-  author={Pacific Prime},
   year={2024},
-  url={https://huggingface.co/Pacific-Prime}
 }
 ```
 ---
-**Built with stability in mind. Train with confidence.**

 ---
+title: Complexity Framework
 emoji: 🐢
 colorFrom: purple
 colorTo: blue
   https://cdn-uploads.huggingface.co/production/uploads/643222d9f76c34519e96a299/8j1GHX24MV3-sv-4zl7ZB.png
 ---
+# Complexity Framework
+**Modular Python framework for building LLMs with INL Dynamics stability**
+## What is Complexity Framework?
+Complexity Framework is a complete toolkit for building transformer architectures with built-in training stability. It provides:
+- **INL Dynamics** - Second-order dynamical system for training stability
+- **Token-Routed MLP (MoE)** - Efficient sparse activation
+- **CUDA/Triton Optimizations** - Flash Attention, Sliding Window, Sparse, Linear
+- **O(N) Architectures** - Mamba, RWKV, RetNet
+- **Small Budget Training** - Quantization, Mixed Precision, Gradient Checkpointing
 ## Key Innovation: INL Dynamics
+Velocity tracking to prevent training explosion after 400k+ steps:
+```python
+from complexity.api import INLDynamics
+# CRITICAL: beta in [0, 2], NOT [0, inf)!
+dynamics = INLDynamics(
+    hidden_size=768,
+    beta_max=2.0,       # Clamp beta for stability
+    velocity_max=10.0,  # Limit velocity
+)
+h_next, v_next = dynamics(hidden_states, velocity)
 ```
+**The bug we fixed**: `softplus` without clamp goes to infinity, causing NaN after 400k steps. Clamping beta to [0, 2] keeps training stable.
+## Loss Spike Recovery
+![Loss Spike Recovery](https://raw.githubusercontent.com/Complexity-ML/complexity-framework/main/docs/loss-spike-recovery.png)
+*INL Dynamics recovers from loss spikes thanks to velocity damping.*
+## Stability at 400k+ Steps
+![Training at 400k steps](https://raw.githubusercontent.com/Complexity-ML/complexity-framework/main/docs/training-400k-stable.png)
+*After beta clamping fix: training remains stable past 400k steps where it previously exploded.*
+## Quick Start
 ```bash
+pip install complexity-framework
 ```
 ```python
+from complexity.api import (
+    # Building blocks
+    Attention, MLP, RMSNorm, RoPE, INLDynamics,
+    # Optimizations
+    CUDA, Efficient,
+    # Architectures O(N)
+    Architecture, Mamba, RWKV,
+)
+# Flash Attention
+attn = CUDA.flash(hidden_size=4096, num_heads=32)
+# INL Dynamics (training stability)
+dynamics = INLDynamics(hidden_size=768, beta_max=2.0)
+h, velocity = dynamics(hidden_states, velocity)
+# Small budget model
+model = Efficient.tiny_llm(vocab_size=32000)  # ~125M params
 ```
+## Features
+| Module | Description |
+|--------|-------------|
+| **Core** | Attention (GQA/MHA/MQA), MLP (SwiGLU/GeGLU/MoE), Position (RoPE/YaRN/ALiBi) |
+| **INL Dynamics** | Velocity tracking for training stability |
+| **CUDA/Triton** | Flash Attention, Sliding Window, Sparse, Linear |
+| **Efficient** | Quantization, Mixed Precision, Small Models |
+| **O(N) Architectures** | Mamba, RWKV, RetNet |
+| **Multimodal** | Vision, Audio, Fusion |
+## Token-Routed MLP (MoE)
+```python
+from complexity.api import MLP, TokenRoutedMLP
+# Via factory
+moe = MLP.moe(hidden_size=4096, num_experts=8, top_k=2)
+# Direct
+moe = TokenRoutedMLP(
+    hidden_size=4096,
+    num_experts=8,
+    top_k=2,
+)
+output, aux_loss = moe(hidden_states)
 ```
+## Small Budget Training
 ```python
+from complexity.api import Efficient
+# Pre-configured models
+model = Efficient.nano_llm(vocab_size=32000)   # ~10M params
+model = Efficient.micro_llm(vocab_size=32000)  # ~30M params
+model = Efficient.tiny_llm(vocab_size=32000)   # ~125M params
+model = Efficient.small_llm(vocab_size=32000)  # ~350M params
+# Memory optimizations
+Efficient.enable_checkpointing(model)
+model, optimizer, scaler = Efficient.mixed_precision(model, optimizer)
 ```
+## O(N) Architectures
+For very long sequences:
+```python
+from complexity.api import Architecture
+model = Architecture.mamba(hidden_size=768, num_layers=12)
+model = Architecture.rwkv(hidden_size=768, num_layers=12)
+model = Architecture.retnet(hidden_size=768, num_layers=12)
+```
+## Documentation
+- [Getting Started](https://github.com/Complexity-ML/complexity-framework/blob/main/docs/getting-started.md)
+- [API Reference](https://github.com/Complexity-ML/complexity-framework/blob/main/docs/api.md)
+- [INL Dynamics](https://github.com/Complexity-ML/complexity-framework/blob/main/docs/dynamics.md)
+- [MoE / Token-Routed MLP](https://github.com/Complexity-ML/complexity-framework/blob/main/docs/moe.md)
+- [CUDA Optimizations](https://github.com/Complexity-ML/complexity-framework/blob/main/docs/cuda.md)
+- [Efficient Training](https://github.com/Complexity-ML/complexity-framework/blob/main/docs/efficient.md)
+- [O(N) Architectures](https://github.com/Complexity-ML/complexity-framework/blob/main/docs/architectures.md)
 ## Links
+- [GitHub](https://github.com/Complexity-ML/complexity-framework)
+- [PyPI](https://pypi.org/project/complexity-framework/) (coming soon)
 ## License
+CC BY-NC 4.0 (Creative Commons Attribution-NonCommercial 4.0)
 ## Citation
 ```bibtex
+@software{complexity_framework_2024,
+  title={Complexity Framework: Modular LLM Building Blocks with INL Dynamics},
+  author={Complexity-ML},
   year={2024},
+  url={https://github.com/Complexity-ML/complexity-framework}
 }
 ```
 ---
+**Build stable LLMs. Train with confidence.**