Spaces:
Running
title: README
emoji: π
colorFrom: blue
colorTo: blue
sdk: static
pinned: false
vLLM Semantic Router
An Mixture-of-Models (MoM) router that intelligently directs OpenAI API requests to the most suitable models from a defined pool based on Semantic Understanding of the request's intent.
This is achieved using BERT classification. Conceptually similar to Mixture-of-Experts (MoE) which lives within a model, this system selects the best entire model for the nature of the task.
π Key Features
π― Auto-selection of Models
Intelligently routes requests to specialized models based on semantic understanding:
- Math queries β Math-specialized models
- Creative writing β Creative-specialized models
- Code generation β Code-specialized models
- General queries β Balanced general-purpose models
π‘οΈ Security & Privacy
- PII Detection: Automatically detects and handles personally identifiable information
- Prompt Guard: Identifies and blocks jailbreak attempts
- Safe Routing: Ensures sensitive prompts are handled appropriately
β‘ Performance Optimization
- Semantic Cache: Caches semantic representations to reduce latency
- Tool Selection: Auto-selects relevant tools to reduce token usage and improve tool selection accuracy
ποΈ Architecture
- Envoy ExtProc Integration: Seamlessly integrates with Envoy proxy
- Dual Implementation: Available in both Go (with Rust FFI) and Python
- Scalable Design: Production-ready with comprehensive monitoring
π Performance Benefits
Our testing shows significant improvements in model accuracy through specialized routing.
π οΈ Architecture Overview
π― Use Cases
- Enterprise API Gateways: Route different types of queries to cost-optimized models
- Multi-tenant Platforms: Provide specialized routing for different customer needs
- Development Environments: Balance cost and performance for different workloads
- Production Services: Ensure optimal model selection with built-in safety measures
π Monitoring & Observability
The router provides comprehensive monitoring through:
- Grafana Dashboard: Real-time metrics and performance tracking
- Prometheus Metrics: Detailed routing statistics and performance data
- Request Tracing: Full visibility into routing decisions and performance
π Documentation
For comprehensive documentation including detailed setup instructions, architecture guides, and API references, visit:
π Complete Documentation at Read the Docs
The documentation includes:
- Installation Guide - Complete setup instructions
- Quick Start - Get running in 5 minutes
- System Architecture - Technical deep dive
- Model Training - How classification models work
- API Reference - Complete API documentation



