ianshank
/

MangoMAS-MoE-7M

Text Classification

MixtureOfExperts7M

mixture-of-experts

cognitive-architecture

reinforcement-learning

Model card Files Files and versions

MangoMAS-MoE-7M / README.md

ianshank's picture

Upload MangoMAS MoE-7M model weights and config

6d50ac4 verified about 2 months ago

|

history blame contribute delete

3.54 kB

	---
	language: en
	license: mit
	library_name: pytorch
	tags:
	- mixture-of-experts
	- multi-agent
	- neural-routing
	- cognitive-architecture
	- reinforcement-learning
	pipeline_tag: text-classification
	---

	# MangoMAS-MoE-7M

	A ~7 million parameter Mixture-of-Experts (MoE) neural routing model for multi-agent task orchestration.

	## Model Architecture

	```
	Input (64-dim feature vector from featurize64())
	│
	┌─────┴─────┐
	│ GATE │ Linear(64→512) → ReLU → Linear(512→16) → Softmax
	└─────┬─────┘
	│
	╔═══════════════════════════════════════════════════╗
	║ 16 Expert Towers (parallel) ║
	║ Each: Linear(64→512) → ReLU → Linear(512→512) ║
	║ → ReLU → Linear(512→256) ║
	╚═══════════════════════════════════════════════════╝
	│
	Weighted Sum (gate_weights × expert_outputs)
	│
	Classifier Head: Linear(256→N_classes)
	│
	Output Logits
	```

	### Parameter Count

	\| Component \| Parameters \|
	\|-----------\|-----------\|
	\| Gate Network \| 64×512 + 512 + 512×16 + 16 = ~41K \|
	\| 16 Expert Towers \| 16 × (64×512 + 512 + 512×512 + 512 + 512×256 + 256) = ~6.9M \|
	\| Classifier Head \| 256×10 + 10 = ~2.6K \|
	\| Total \| ~6.95M \|

	## Input: 64-Dimensional Feature Vector

	The model consumes a 64-dimensional feature vector produced by `featurize64()`:

	- Dims 0-31: Hash-based sinusoidal encoding (content fingerprint)
	- Dims 32-47: Domain tag detection (code, security, architecture, etc.)
	- Dims 48-55: Structural signals (length, punctuation, questions)
	- Dims 56-59: Sentiment polarity estimates
	- Dims 60-63: Novelty/complexity scores

	## Training

	- Optimizer: AdamW (lr=1e-4, weight_decay=0.01)
	- Updates: Online learning from routing feedback
	- Minimum reward threshold: 0.1
	- Device: CPU / MPS / CUDA (auto-detected)

	## Usage

	```python
	import torch
	from moe_model import MixtureOfExperts7M, featurize64

	# Create model
	model = MixtureOfExperts7M(num_classes=10, num_experts=16)

	# Extract features
	features = featurize64("Design a secure REST API with authentication")
	x = torch.tensor([features], dtype=torch.float32)

	# Forward pass
	logits, gate_weights = model(x)
	print(f"Expert weights: {gate_weights}")
	print(f"Top expert: {gate_weights.argmax().item()}")
	```

	## Intended Use

	This model is part of the MangoMAS multi-agent orchestration platform. It routes incoming tasks to the most appropriate expert agents based on the task's semantic content.

	Primary use cases:

	- Multi-agent task routing
	- Expert selection for cognitive cell orchestration
	- Research demonstration of MoE architectures

	## Interactive Demo

	Try the model live on the [MangoMAS HuggingFace Space](https://huggingface.co/spaces/ianshank/MangoMAS).

	## Citation

	```bibtex
	@software{mangomas2026,
	title={MangoMAS: Multi-Agent Cognitive Architecture},
	author={Shanker, Ian},
	year={2026},
	url={https://github.com/ianshank/MangoMAS}
	}
	```

	## Author

	Built by [Ian Shanker](https://huggingface.co/ianshank) — MangoMAS Engineering