--- language: en license: mit library_name: pytorch tags: - mixture-of-experts - multi-agent - neural-routing - cognitive-architecture - reinforcement-learning pipeline_tag: text-classification --- # MangoMAS-MoE-7M A ~7 million parameter **Mixture-of-Experts** (MoE) neural routing model for multi-agent task orchestration. ## Model Architecture ``` Input (64-dim feature vector from featurize64()) │ ┌─────┴─────┐ │ GATE │ Linear(64→512) → ReLU → Linear(512→16) → Softmax └─────┬─────┘ │ ╔═══════════════════════════════════════════════════╗ ║ 16 Expert Towers (parallel) ║ ║ Each: Linear(64→512) → ReLU → Linear(512→512) ║ ║ → ReLU → Linear(512→256) ║ ╚═══════════════════════════════════════════════════╝ │ Weighted Sum (gate_weights × expert_outputs) │ Classifier Head: Linear(256→N_classes) │ Output Logits ``` ### Parameter Count | Component | Parameters | |-----------|-----------| | Gate Network | 64×512 + 512 + 512×16 + 16 = ~41K | | 16 Expert Towers | 16 × (64×512 + 512 + 512×512 + 512 + 512×256 + 256) = ~6.9M | | Classifier Head | 256×10 + 10 = ~2.6K | | **Total** | **~6.95M** | ## Input: 64-Dimensional Feature Vector The model consumes a 64-dimensional feature vector produced by `featurize64()`: - **Dims 0-31**: Hash-based sinusoidal encoding (content fingerprint) - **Dims 32-47**: Domain tag detection (code, security, architecture, etc.) - **Dims 48-55**: Structural signals (length, punctuation, questions) - **Dims 56-59**: Sentiment polarity estimates - **Dims 60-63**: Novelty/complexity scores ## Training - **Optimizer**: AdamW (lr=1e-4, weight_decay=0.01) - **Updates**: Online learning from routing feedback - **Minimum reward threshold**: 0.1 - **Device**: CPU / MPS / CUDA (auto-detected) ## Usage ```python import torch from moe_model import MixtureOfExperts7M, featurize64 # Create model model = MixtureOfExperts7M(num_classes=10, num_experts=16) # Extract features features = featurize64("Design a secure REST API with authentication") x = torch.tensor([features], dtype=torch.float32) # Forward pass logits, gate_weights = model(x) print(f"Expert weights: {gate_weights}") print(f"Top expert: {gate_weights.argmax().item()}") ``` ## Intended Use This model is part of the **MangoMAS** multi-agent orchestration platform. It routes incoming tasks to the most appropriate expert agents based on the task's semantic content. **Primary use cases:** - Multi-agent task routing - Expert selection for cognitive cell orchestration - Research demonstration of MoE architectures ## Interactive Demo Try the model live on the [MangoMAS HuggingFace Space](https://huggingface.co/spaces/ianshank/MangoMAS). ## Citation ```bibtex @software{mangomas2026, title={MangoMAS: Multi-Agent Cognitive Architecture}, author={Cruickshank, Ian}, year={2026}, url={https://github.com/ianshank/MangoMAS} } ``` ## Author Built by [Ian Cruickshank](https://huggingface.co/ianshank) — MangoMAS Engineering