Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -22,34 +22,3 @@ short_description: 'MoM: Specialized Models for Intelligent Routing'
|
|
| 22 |
## Why MoM?
|
| 23 |
|
| 24 |
vLLM-SR solves a critical problem: **how to route LLM requests to the right model at the right time**. Not every query needs the same resources—"What's the weather?" shouldn't cost as much as "Analyze this legal contract."
|
| 25 |
-
|
| 26 |
-
## MoM System Card
|
| 27 |
-
|
| 28 |
-
A quick overview of all MoM models:
|
| 29 |
-
|
| 30 |
-
<div align="center">
|
| 31 |
-
|
| 32 |
-
| Category | Model | Size | Architecture | Base Model | Purpose |
|
| 33 |
-
|----------|-------|------|--------------|------------|---------|
|
| 34 |
-
| **🧠 Intelligent Routing** | mom-brain-flash | Flash | Encoder | ModernBERT | Ultra-fast intent classification |
|
| 35 |
-
| | mom-brain-pro | Pro | Decoder | Qwen3 0.6B | Balanced routing with reasoning |
|
| 36 |
-
| | mom-brain-max | Max | Decoder | Qwen3 1.7B | Maximum accuracy for complex decisions |
|
| 37 |
-
| **🔍 Similarity Search** | mom-similarity-flash | Flash | Encoder | BERT | Semantic similarity matching |
|
| 38 |
-
| **🔒 Prompt Guardian** | mom-jailbreak-flash | Flash | Encoder | ModernBERT | Jailbreak/attack detection |
|
| 39 |
-
| | mom-pii-flash | Flash | Encoder | ModernBERT | PII detection & privacy protection |
|
| 40 |
-
| **🎯 SLM Experts** | mom-expert-math-flash | Flash | Decoder | Qwen3 0.6B | Backend math problem solver |
|
| 41 |
-
| | mom-expert-science-flash | Flash | Decoder | Qwen3 0.6B | Backend science problem solver |
|
| 42 |
-
| | mom-expert-social-flash | Flash | Decoder | Qwen3 0.6B | Backend social sciences solver |
|
| 43 |
-
| | mom-expert-humanities-flash | Flash | Decoder | Qwen3 0.6B | Backend humanities solver |
|
| 44 |
-
| | mom-expert-law-flash | Flash | Decoder | Qwen3 0.6B | Backend law problem solver |
|
| 45 |
-
| | mom-expert-generalist-flash | Flash | Decoder | Qwen3 0.6B | Backend generalist solver |
|
| 46 |
-
|
| 47 |
-
</div>
|
| 48 |
-
|
| 49 |
-
**Key Insights:**
|
| 50 |
-
|
| 51 |
-
- **4 Categories**: 3 for routing (Intelligent Routing, Similarity Search, Prompt Guardian) + 1 for backend problem solving (SLM Experts)
|
| 52 |
-
- **ModernBERT** (encoder-only) → Sub-10ms latency for high-throughput routing
|
| 53 |
-
- **Qwen3** (decoder-only) → Explainable routing decisions + domain-specific problem solving
|
| 54 |
-
- **Flash** models achieve 10,000+ QPS on commodity hardware
|
| 55 |
-
- **SLM Experts** are not routers—they are specialized backend models that solve domain-specific problems
|
|
|
|
| 22 |
## Why MoM?
|
| 23 |
|
| 24 |
vLLM-SR solves a critical problem: **how to route LLM requests to the right model at the right time**. Not every query needs the same resources—"What's the weather?" shouldn't cost as much as "Analyze this legal contract."
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|