ianshank
/

MangoMAS-MoE-7M

+---
+language: en
+license: mit
+library_name: pytorch
+tags:
+  - mixture-of-experts
+  - multi-agent
+  - neural-routing
+  - cognitive-architecture
+  - reinforcement-learning
+pipeline_tag: text-classification
+---
+# MangoMAS-MoE-7M
+A ~7 million parameter **Mixture-of-Experts** (MoE) neural routing model for multi-agent task orchestration.
+## Model Architecture
+```
+Input (64-dim feature vector from featurize64())
+         │
+    ┌─────┴─────┐
+    │   GATE    │  Linear(64→512) → ReLU → Linear(512→16) → Softmax
+    └─────┬─────┘
+          │
+    ╔═══════════════════════════════════════════════════╗
+    ║     16 Expert Towers (parallel)                    ║
+    ║  Each: Linear(64→512) → ReLU → Linear(512→512)   ║
+    ║        → ReLU → Linear(512→256)                    ║
+    ╚═══════════════════════════════════════════════════╝
+          │
+    Weighted Sum (gate_weights × expert_outputs)
+          │
+    Classifier Head: Linear(256→N_classes)
+          │
+       Output Logits
+```
+### Parameter Count
+| Component | Parameters |
+|-----------|-----------|
+| Gate Network | 64×512 + 512 + 512×16 + 16 = ~41K |
+| 16 Expert Towers | 16 × (64×512 + 512 + 512×512 + 512 + 512×256 + 256) = ~6.9M |
+| Classifier Head | 256×10 + 10 = ~2.6K |
+| **Total** | **~6.95M** |
+## Input: 64-Dimensional Feature Vector
+The model consumes a 64-dimensional feature vector produced by `featurize64()`:
+- **Dims 0-31**: Hash-based sinusoidal encoding (content fingerprint)
+- **Dims 32-47**: Domain tag detection (code, security, architecture, etc.)
+- **Dims 48-55**: Structural signals (length, punctuation, questions)
+- **Dims 56-59**: Sentiment polarity estimates
+- **Dims 60-63**: Novelty/complexity scores
+## Training
+- **Optimizer**: AdamW (lr=1e-4, weight_decay=0.01)
+- **Updates**: Online learning from routing feedback
+- **Minimum reward threshold**: 0.1
+- **Device**: CPU / MPS / CUDA (auto-detected)
+## Usage
+```python
+import torch
+from moe_model import MixtureOfExperts7M, featurize64
+# Create model
+model = MixtureOfExperts7M(num_classes=10, num_experts=16)
+# Extract features
+features = featurize64("Design a secure REST API with authentication")
+x = torch.tensor([features], dtype=torch.float32)
+# Forward pass
+logits, gate_weights = model(x)
+print(f"Expert weights: {gate_weights}")
+print(f"Top expert: {gate_weights.argmax().item()}")
+```
+## Intended Use
+This model is part of the **MangoMAS** multi-agent orchestration platform. It routes incoming tasks to the most appropriate expert agents based on the task's semantic content.
+**Primary use cases:**
+- Multi-agent task routing
+- Expert selection for cognitive cell orchestration
+- Research demonstration of MoE architectures
+## Interactive Demo
+Try the model live on the [MangoMAS HuggingFace Space](https://huggingface.co/spaces/ianshank/MangoMAS).
+## Citation
+```bibtex
+@software{mangomas2026,
+  title={MangoMAS: Multi-Agent Cognitive Architecture},
+  author={Shanker, Ian},
+  year={2026},
+  url={https://github.com/ianshank/MangoMAS}
+}
+```
+## Author
+Built by [Ian Shanker](https://huggingface.co/ianshank) — MangoMAS Engineering

config.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "model_type": "MixtureOfExperts7M",
+  "num_classes": 10,
+  "num_experts": 16,
+  "input_dim": 64,
+  "expert_hidden1": 512,
+  "expert_hidden2": 512,
+  "expert_output_dim": 256,
+  "gate_hidden": 512,
+  "parameter_count": 6880282,
+  "framework": "pytorch"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0fe5f5a7afeb0e16c82289fd12933adbc2a9ac92461a291a74a4ecd97b26ec82
+size 27547547