File size: 3,438 Bytes
ca65210
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
---
language: en
license: mit
library_name: pytorch
tags:
  - mixture-of-experts
  - multi-agent
  - neural-routing
  - cognitive-architecture
  - reinforcement-learning
pipeline_tag: text-classification
---

# MangoMAS-MoE-7M

A ~7 million parameter **Mixture-of-Experts** (MoE) neural routing model for multi-agent task orchestration.

## Model Architecture

```
Input (64-dim feature vector from featurize64())

    ┌─────┴─────┐
    │   GATE    │  Linear(64→512) → ReLU → Linear(512→16) → Softmax
    └─────┬─────┘

    ╔═══════════════════════════════════════════════════╗
    ║     16 Expert Towers (parallel)                    ║
    ║  Each: Linear(64→512) → ReLU → Linear(512→512)   ║
    ║        → ReLU → Linear(512→256)                    ║
    ╚═══════════════════════════════════════════════════╝

    Weighted Sum (gate_weights × expert_outputs)

    Classifier Head: Linear(256→N_classes)

       Output Logits
```

### Parameter Count

| Component | Parameters |
|-----------|-----------|
| Gate Network | 64×512 + 512 + 512×16 + 16 = ~41K |
| 16 Expert Towers | 16 × (64×512 + 512 + 512×512 + 512 + 512×256 + 256) = ~6.9M |
| Classifier Head | 256×10 + 10 = ~2.6K |
| **Total** | **~6.95M** |

## Input: 64-Dimensional Feature Vector

The model consumes a 64-dimensional feature vector produced by `featurize64()`:

- **Dims 0-31**: Hash-based sinusoidal encoding (content fingerprint)
- **Dims 32-47**: Domain tag detection (code, security, architecture, etc.)
- **Dims 48-55**: Structural signals (length, punctuation, questions)
- **Dims 56-59**: Sentiment polarity estimates
- **Dims 60-63**: Novelty/complexity scores

## Training

- **Optimizer**: AdamW (lr=1e-4, weight_decay=0.01)
- **Updates**: Online learning from routing feedback
- **Minimum reward threshold**: 0.1
- **Device**: CPU / MPS / CUDA (auto-detected)

## Usage

```python
import torch
from moe_model import MixtureOfExperts7M, featurize64

# Create model
model = MixtureOfExperts7M(num_classes=10, num_experts=16)

# Extract features
features = featurize64("Design a secure REST API with authentication")
x = torch.tensor([features], dtype=torch.float32)

# Forward pass
logits, gate_weights = model(x)
print(f"Expert weights: {gate_weights}")
print(f"Top expert: {gate_weights.argmax().item()}")
```

## Intended Use

This model is part of the **MangoMAS** multi-agent orchestration platform. It routes incoming tasks to the most appropriate expert agents based on the task's semantic content.

**Primary use cases:**

- Multi-agent task routing
- Expert selection for cognitive cell orchestration
- Research demonstration of MoE architectures

## Interactive Demo

Try the model live on the [MangoMAS HuggingFace Space](https://huggingface.co/spaces/ianshank/MangoMAS).

## Citation

```bibtex
@software{mangomas2026,
  title={MangoMAS: Multi-Agent Cognitive Architecture},
  author={Cruickshank, Ian},
  year={2026},
  url={https://github.com/ianshank/MangoMAS}
}
```

## Author

Built by [Ian Cruickshank](https://huggingface.co/ianshank) — MangoMAS Engineering