Xunzhuo commited on
Commit
d757c33
·
verified ·
1 Parent(s): 4079234

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -31
README.md CHANGED
@@ -22,34 +22,3 @@ short_description: 'MoM: Specialized Models for Intelligent Routing'
22
  ## Why MoM?
23
 
24
  vLLM-SR solves a critical problem: **how to route LLM requests to the right model at the right time**. Not every query needs the same resources—"What's the weather?" shouldn't cost as much as "Analyze this legal contract."
25
-
26
- ## MoM System Card
27
-
28
- A quick overview of all MoM models:
29
-
30
- <div align="center">
31
-
32
- | Category | Model | Size | Architecture | Base Model | Purpose |
33
- |----------|-------|------|--------------|------------|---------|
34
- | **🧠 Intelligent Routing** | mom-brain-flash | Flash | Encoder | ModernBERT | Ultra-fast intent classification |
35
- | | mom-brain-pro | Pro | Decoder | Qwen3 0.6B | Balanced routing with reasoning |
36
- | | mom-brain-max | Max | Decoder | Qwen3 1.7B | Maximum accuracy for complex decisions |
37
- | **🔍 Similarity Search** | mom-similarity-flash | Flash | Encoder | BERT | Semantic similarity matching |
38
- | **🔒 Prompt Guardian** | mom-jailbreak-flash | Flash | Encoder | ModernBERT | Jailbreak/attack detection |
39
- | | mom-pii-flash | Flash | Encoder | ModernBERT | PII detection & privacy protection |
40
- | **🎯 SLM Experts** | mom-expert-math-flash | Flash | Decoder | Qwen3 0.6B | Backend math problem solver |
41
- | | mom-expert-science-flash | Flash | Decoder | Qwen3 0.6B | Backend science problem solver |
42
- | | mom-expert-social-flash | Flash | Decoder | Qwen3 0.6B | Backend social sciences solver |
43
- | | mom-expert-humanities-flash | Flash | Decoder | Qwen3 0.6B | Backend humanities solver |
44
- | | mom-expert-law-flash | Flash | Decoder | Qwen3 0.6B | Backend law problem solver |
45
- | | mom-expert-generalist-flash | Flash | Decoder | Qwen3 0.6B | Backend generalist solver |
46
-
47
- </div>
48
-
49
- **Key Insights:**
50
-
51
- - **4 Categories**: 3 for routing (Intelligent Routing, Similarity Search, Prompt Guardian) + 1 for backend problem solving (SLM Experts)
52
- - **ModernBERT** (encoder-only) → Sub-10ms latency for high-throughput routing
53
- - **Qwen3** (decoder-only) → Explainable routing decisions + domain-specific problem solving
54
- - **Flash** models achieve 10,000+ QPS on commodity hardware
55
- - **SLM Experts** are not routers—they are specialized backend models that solve domain-specific problems
 
22
  ## Why MoM?
23
 
24
  vLLM-SR solves a critical problem: **how to route LLM requests to the right model at the right time**. Not every query needs the same resources—"What's the weather?" shouldn't cost as much as "Analyze this legal contract."