MangoMAS-MoE-7M / README.md
ianshank's picture
Upload MangoMAS MoE-7M model weights and config
6d50ac4 verified
---
language: en
license: mit
library_name: pytorch
tags:
- mixture-of-experts
- multi-agent
- neural-routing
- cognitive-architecture
- reinforcement-learning
pipeline_tag: text-classification
---
# MangoMAS-MoE-7M
A ~7 million parameter **Mixture-of-Experts** (MoE) neural routing model for multi-agent task orchestration.
## Model Architecture
```
Input (64-dim feature vector from featurize64())
β”‚
β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
β”‚ GATE β”‚ Linear(64β†’512) β†’ ReLU β†’ Linear(512β†’16) β†’ Softmax
β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
β”‚
╔═══════════════════════════════════════════════════╗
β•‘ 16 Expert Towers (parallel) β•‘
β•‘ Each: Linear(64β†’512) β†’ ReLU β†’ Linear(512β†’512) β•‘
β•‘ β†’ ReLU β†’ Linear(512β†’256) β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•
β”‚
Weighted Sum (gate_weights Γ— expert_outputs)
β”‚
Classifier Head: Linear(256β†’N_classes)
β”‚
Output Logits
```
### Parameter Count
| Component | Parameters |
|-----------|-----------|
| Gate Network | 64Γ—512 + 512 + 512Γ—16 + 16 = ~41K |
| 16 Expert Towers | 16 Γ— (64Γ—512 + 512 + 512Γ—512 + 512 + 512Γ—256 + 256) = ~6.9M |
| Classifier Head | 256Γ—10 + 10 = ~2.6K |
| **Total** | **~6.95M** |
## Input: 64-Dimensional Feature Vector
The model consumes a 64-dimensional feature vector produced by `featurize64()`:
- **Dims 0-31**: Hash-based sinusoidal encoding (content fingerprint)
- **Dims 32-47**: Domain tag detection (code, security, architecture, etc.)
- **Dims 48-55**: Structural signals (length, punctuation, questions)
- **Dims 56-59**: Sentiment polarity estimates
- **Dims 60-63**: Novelty/complexity scores
## Training
- **Optimizer**: AdamW (lr=1e-4, weight_decay=0.01)
- **Updates**: Online learning from routing feedback
- **Minimum reward threshold**: 0.1
- **Device**: CPU / MPS / CUDA (auto-detected)
## Usage
```python
import torch
from moe_model import MixtureOfExperts7M, featurize64
# Create model
model = MixtureOfExperts7M(num_classes=10, num_experts=16)
# Extract features
features = featurize64("Design a secure REST API with authentication")
x = torch.tensor([features], dtype=torch.float32)
# Forward pass
logits, gate_weights = model(x)
print(f"Expert weights: {gate_weights}")
print(f"Top expert: {gate_weights.argmax().item()}")
```
## Intended Use
This model is part of the **MangoMAS** multi-agent orchestration platform. It routes incoming tasks to the most appropriate expert agents based on the task's semantic content.
**Primary use cases:**
- Multi-agent task routing
- Expert selection for cognitive cell orchestration
- Research demonstration of MoE architectures
## Interactive Demo
Try the model live on the [MangoMAS HuggingFace Space](https://huggingface.co/spaces/ianshank/MangoMAS).
## Citation
```bibtex
@software{mangomas2026,
title={MangoMAS: Multi-Agent Cognitive Architecture},
author={Shanker, Ian},
year={2026},
url={https://github.com/ianshank/MangoMAS}
}
```
## Author
Built by [Ian Shanker](https://huggingface.co/ianshank) β€” MangoMAS Engineering