Spaces:

llm-semantic-router
/

README

Running

App Files Files Community

Xunzhuo commited on 11 days ago

Commit

d757c33

verified ·

1 Parent(s): 4079234

Update README.md

Browse files

Files changed (1) hide show

README.md +0 -31

README.md CHANGED Viewed

@@ -22,34 +22,3 @@ short_description: 'MoM: Specialized Models for Intelligent Routing'
 ## Why MoM?
 vLLM-SR solves a critical problem: **how to route LLM requests to the right model at the right time**. Not every query needs the same resources—"What's the weather?" shouldn't cost as much as "Analyze this legal contract."
-## MoM System Card
-A quick overview of all MoM models:
-<div align="center">
-| Category | Model | Size | Architecture | Base Model | Purpose |
-|----------|-------|------|--------------|------------|---------|
-| **🧠 Intelligent Routing** | mom-brain-flash | Flash | Encoder | ModernBERT | Ultra-fast intent classification |
-| | mom-brain-pro | Pro | Decoder | Qwen3 0.6B | Balanced routing with reasoning |
-| | mom-brain-max | Max | Decoder | Qwen3 1.7B | Maximum accuracy for complex decisions |
-| **🔍 Similarity Search** | mom-similarity-flash | Flash | Encoder | BERT | Semantic similarity matching |
-| **🔒 Prompt Guardian** | mom-jailbreak-flash | Flash | Encoder | ModernBERT | Jailbreak/attack detection |
-| | mom-pii-flash | Flash | Encoder | ModernBERT | PII detection & privacy protection |
-| **🎯 SLM Experts** | mom-expert-math-flash | Flash | Decoder | Qwen3 0.6B | Backend math problem solver |
-| | mom-expert-science-flash | Flash | Decoder | Qwen3 0.6B | Backend science problem solver |
-| | mom-expert-social-flash | Flash | Decoder | Qwen3 0.6B | Backend social sciences solver |
-| | mom-expert-humanities-flash | Flash | Decoder | Qwen3 0.6B | Backend humanities solver |
-| | mom-expert-law-flash | Flash | Decoder | Qwen3 0.6B | Backend law problem solver |
-| | mom-expert-generalist-flash | Flash | Decoder | Qwen3 0.6B | Backend generalist solver |
-</div>
-**Key Insights:**
-- **4 Categories**: 3 for routing (Intelligent Routing, Similarity Search, Prompt Guardian) + 1 for backend problem solving (SLM Experts)
-- **ModernBERT** (encoder-only) → Sub-10ms latency for high-throughput routing
-- **Qwen3** (decoder-only) → Explainable routing decisions + domain-specific problem solving
-- **Flash** models achieve 10,000+ QPS on commodity hardware
-- **SLM Experts** are not routers—they are specialized backend models that solve domain-specific problems


22	## Why MoM?
23
24	vLLM-SR solves a critical problem: how to route LLM requests to the right model at the right time. Not every query needs the same resources—"What's the weather?" shouldn't cost as much as "Analyze this legal contract."