--- title: Cascade - Intelligent LLM Router emoji: 🌊 colorFrom: purple colorTo: blue sdk: streamlit sdk_version: "1.31.0" app_file: app.py pinned: false --- # Cascade 🌊 **Intelligent LLM Request Router** - Reduce API costs by 60%+ through smart routing and semantic caching. ## What is Cascade? Cascade is an intelligent proxy that automatically routes LLM requests to the most cost-effective model based on query complexity: - **Simple queries** → Free local models (Ollama) or GPT-3.5 - **Medium queries** → GPT-4o-mini ($0.15/1M tokens) - **Complex queries** → GPT-4o ($2.50/1M tokens) ## Features - 🎯 **ML-Powered Routing** - Predicts query complexity in <20ms - 💰 **60%+ Cost Savings** - Routes simple queries to cheaper models - ⚡ **Semantic Caching** - Vector similarity search for cached responses - 📊 **Real-Time Analytics** - Dashboard showing savings and usage metrics - 🔌 **OpenAI Compatible** - Drop-in replacement for OpenAI API ## Using This Space This Space provides a demo UI where you can: 1. **Test Routing** - See how different queries get routed to different models 2. **View Analytics** - Track cost savings and cache hit rates 3. **Interactive Chat** - Try example queries and see real-time responses ## Configuration To use this with real LLM APIs, set these environment variables: - `OPENAI_API_KEY` - Your OpenAI API key - `OLLAMA_BASE_URL` - Ollama server URL (optional) - `REDIS_URL` - Redis connection for caching (optional) - `QDRANT_URL` - Qdrant server for semantic cache (optional) ## Learn More - [GitHub Repository](https://github.com/ayushm98/cascade) - [Documentation](https://github.com/ayushm98/cascade#readme) - [Contributing Guide](https://github.com/ayushm98/cascade/blob/main/CONTRIBUTING.md) ## Architecture ``` Request → Cache Check → ML Classifier → Route to Model ↓ ↓ ↓ Semantic Simple/Med/ Ollama/GPT-4o-mini/ Cache Complex GPT-4o ``` ## Tech Stack - **Backend**: FastAPI, Python 3.11+ - **ML**: DistilBERT (ONNX), Sentence Transformers - **Caching**: Redis (exact match), Qdrant (semantic) - **UI**: Streamlit, Plotly - **Deployment**: Docker, Hugging Face Spaces --- Built with ❤️ by [ayushm98](https://github.com/ayushm98)