| ---
|
| language: en
|
| license: mit
|
| library_name: pytorch
|
| tags:
|
| - mixture-of-experts
|
| - multi-agent
|
| - neural-routing
|
| - cognitive-architecture
|
| - reinforcement-learning
|
| pipeline_tag: text-classification
|
| ---
|
|
|
| # MangoMAS-MoE-7M
|
|
|
| A ~7 million parameter **Mixture-of-Experts** (MoE) neural routing model for multi-agent task orchestration.
|
|
|
| ## Model Architecture
|
|
|
| ```
|
| Input (64-dim feature vector from featurize64())
|
| β
|
| βββββββ΄ββββββ
|
| β GATE β Linear(64β512) β ReLU β Linear(512β16) β Softmax
|
| βββββββ¬ββββββ
|
| β
|
| βββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| β 16 Expert Towers (parallel) β
|
| β Each: Linear(64β512) β ReLU β Linear(512β512) β
|
| β β ReLU β Linear(512β256) β
|
| βββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| β
|
| Weighted Sum (gate_weights Γ expert_outputs)
|
| β
|
| Classifier Head: Linear(256βN_classes)
|
| β
|
| Output Logits
|
| ```
|
|
|
| ### Parameter Count
|
|
|
| | Component | Parameters |
|
| |-----------|-----------|
|
| | Gate Network | 64Γ512 + 512 + 512Γ16 + 16 = ~41K |
|
| | 16 Expert Towers | 16 Γ (64Γ512 + 512 + 512Γ512 + 512 + 512Γ256 + 256) = ~6.9M |
|
| | Classifier Head | 256Γ10 + 10 = ~2.6K |
|
| | **Total** | **~6.95M** |
|
|
|
| ## Input: 64-Dimensional Feature Vector
|
|
|
| The model consumes a 64-dimensional feature vector produced by `featurize64()`:
|
|
|
| - **Dims 0-31**: Hash-based sinusoidal encoding (content fingerprint)
|
| - **Dims 32-47**: Domain tag detection (code, security, architecture, etc.)
|
| - **Dims 48-55**: Structural signals (length, punctuation, questions)
|
| - **Dims 56-59**: Sentiment polarity estimates
|
| - **Dims 60-63**: Novelty/complexity scores
|
|
|
| ## Training
|
|
|
| - **Optimizer**: AdamW (lr=1e-4, weight_decay=0.01)
|
| - **Updates**: Online learning from routing feedback
|
| - **Minimum reward threshold**: 0.1
|
| - **Device**: CPU / MPS / CUDA (auto-detected)
|
|
|
| ## Usage
|
|
|
| ```python
|
| import torch
|
| from moe_model import MixtureOfExperts7M, featurize64
|
|
|
| # Create model
|
| model = MixtureOfExperts7M(num_classes=10, num_experts=16)
|
|
|
| # Extract features
|
| features = featurize64("Design a secure REST API with authentication")
|
| x = torch.tensor([features], dtype=torch.float32)
|
|
|
| # Forward pass
|
| logits, gate_weights = model(x)
|
| print(f"Expert weights: {gate_weights}")
|
| print(f"Top expert: {gate_weights.argmax().item()}")
|
| ```
|
|
|
| ## Intended Use
|
|
|
| This model is part of the **MangoMAS** multi-agent orchestration platform. It routes incoming tasks to the most appropriate expert agents based on the task's semantic content.
|
|
|
| **Primary use cases:**
|
|
|
| - Multi-agent task routing
|
| - Expert selection for cognitive cell orchestration
|
| - Research demonstration of MoE architectures
|
|
|
| ## Interactive Demo
|
|
|
| Try the model live on the [MangoMAS HuggingFace Space](https://huggingface.co/spaces/ianshank/MangoMAS).
|
|
|
| ## Citation
|
|
|
| ```bibtex
|
| @software{mangomas2026,
|
| title={MangoMAS: Multi-Agent Cognitive Architecture},
|
| author={Shanker, Ian},
|
| year={2026},
|
| url={https://github.com/ianshank/MangoMAS}
|
| }
|
| ```
|
|
|
| ## Author
|
|
|
| Built by [Ian Shanker](https://huggingface.co/ianshank) β MangoMAS Engineering
|
| |