| --- |
| language: en |
| license: mit |
| library_name: pytorch |
| tags: |
| - mixture-of-experts |
| - multi-agent |
| - neural-routing |
| - cognitive-architecture |
| - reinforcement-learning |
| pipeline_tag: text-classification |
| --- |
| |
| # MangoMAS-MoE-7M |
|
|
| A ~7 million parameter **Mixture-of-Experts** (MoE) neural routing model for multi-agent task orchestration. |
|
|
| ## Model Architecture |
|
|
| ``` |
| Input (64-dim feature vector from featurize64()) |
| │ |
| ┌─────┴─────┐ |
| │ GATE │ Linear(64→512) → ReLU → Linear(512→16) → Softmax |
| └─────┬─────┘ |
| │ |
| ╔═══════════════════════════════════════════════════╗ |
| ║ 16 Expert Towers (parallel) ║ |
| ║ Each: Linear(64→512) → ReLU → Linear(512→512) ║ |
| ║ → ReLU → Linear(512→256) ║ |
| ╚═══════════════════════════════════════════════════╝ |
| │ |
| Weighted Sum (gate_weights × expert_outputs) |
| │ |
| Classifier Head: Linear(256→N_classes) |
| │ |
| Output Logits |
| ``` |
|
|
| ### Parameter Count |
|
|
| | Component | Parameters | |
| |-----------|-----------| |
| | Gate Network | 64×512 + 512 + 512×16 + 16 = ~41K | |
| | 16 Expert Towers | 16 × (64×512 + 512 + 512×512 + 512 + 512×256 + 256) = ~6.9M | |
| | Classifier Head | 256×10 + 10 = ~2.6K | |
| | **Total** | **~6.95M** | |
|
|
| ## Input: 64-Dimensional Feature Vector |
|
|
| The model consumes a 64-dimensional feature vector produced by `featurize64()`: |
|
|
| - **Dims 0-31**: Hash-based sinusoidal encoding (content fingerprint) |
| - **Dims 32-47**: Domain tag detection (code, security, architecture, etc.) |
| - **Dims 48-55**: Structural signals (length, punctuation, questions) |
| - **Dims 56-59**: Sentiment polarity estimates |
| - **Dims 60-63**: Novelty/complexity scores |
|
|
| ## Training |
|
|
| - **Optimizer**: AdamW (lr=1e-4, weight_decay=0.01) |
| - **Updates**: Online learning from routing feedback |
| - **Minimum reward threshold**: 0.1 |
| - **Device**: CPU / MPS / CUDA (auto-detected) |
| |
| ## Usage |
| |
| ```python |
| import torch |
| from moe_model import MixtureOfExperts7M, featurize64 |
|
|
| # Create model |
| model = MixtureOfExperts7M(num_classes=10, num_experts=16) |
|
|
| # Extract features |
| features = featurize64("Design a secure REST API with authentication") |
| x = torch.tensor([features], dtype=torch.float32) |
|
|
| # Forward pass |
| logits, gate_weights = model(x) |
| print(f"Expert weights: {gate_weights}") |
| print(f"Top expert: {gate_weights.argmax().item()}") |
| ``` |
| |
| ## Intended Use |
| |
| This model is part of the **MangoMAS** multi-agent orchestration platform. It routes incoming tasks to the most appropriate expert agents based on the task's semantic content. |
| |
| **Primary use cases:** |
| |
| - Multi-agent task routing |
| - Expert selection for cognitive cell orchestration |
| - Research demonstration of MoE architectures |
| |
| ## Interactive Demo |
| |
| Try the model live on the [MangoMAS HuggingFace Space](https://huggingface.co/spaces/ianshank/MangoMAS). |
| |
| ## Citation |
| |
| ```bibtex |
| @software{mangomas2026, |
| title={MangoMAS: Multi-Agent Cognitive Architecture}, |
| author={Cruickshank, Ian}, |
| year={2026}, |
| url={https://github.com/ianshank/MangoMAS} |
| } |
| ``` |
| |
| ## Author |
| |
| Built by [Ian Cruickshank](https://huggingface.co/ianshank) — MangoMAS Engineering |
| |