Spaces:

ayushm98
/

cascade

Running

ayushm98 commited on Dec 30, 2025

Commit

1f9ceed

1 Parent(s): 7b1b1fd

docs: add Hugging Face Spaces README

- Add Space metadata (title, emoji, SDK config)
- Document features and architecture
- Provide setup instructions for environment variables

Files changed (1) hide show

README_SPACES.md +74 -0

README_SPACES.md ADDED Viewed

	@@ -0,0 +1,74 @@

+---
+title: Cascade - Intelligent LLM Router
+emoji: 🌊
+colorFrom: purple
+colorTo: blue
+sdk: streamlit
+sdk_version: "1.31.0"
+app_file: app.py
+pinned: false
+---
+# Cascade 🌊
+**Intelligent LLM Request Router** - Reduce API costs by 60%+ through smart routing and semantic caching.
+## What is Cascade?
+Cascade is an intelligent proxy that automatically routes LLM requests to the most cost-effective model based on query complexity:
+- **Simple queries** → Free local models (Ollama) or GPT-3.5
+- **Medium queries** → GPT-4o-mini ($0.15/1M tokens)
+- **Complex queries** → GPT-4o ($2.50/1M tokens)
+## Features
+- 🎯 **ML-Powered Routing** - Predicts query complexity in <20ms
+- 💰 **60%+ Cost Savings** - Routes simple queries to cheaper models
+- ⚡ **Semantic Caching** - Vector similarity search for cached responses
+- 📊 **Real-Time Analytics** - Dashboard showing savings and usage metrics
+- 🔌 **OpenAI Compatible** - Drop-in replacement for OpenAI API
+## Using This Space
+This Space provides a demo UI where you can:
+1. **Test Routing** - See how different queries get routed to different models
+2. **View Analytics** - Track cost savings and cache hit rates
+3. **Interactive Chat** - Try example queries and see real-time responses
+## Configuration
+To use this with real LLM APIs, set these environment variables:
+- `OPENAI_API_KEY` - Your OpenAI API key
+- `OLLAMA_BASE_URL` - Ollama server URL (optional)
+- `REDIS_URL` - Redis connection for caching (optional)
+- `QDRANT_URL` - Qdrant server for semantic cache (optional)
+## Learn More
+- [GitHub Repository](https://github.com/ayushm98/cascade)
+- [Documentation](https://github.com/ayushm98/cascade#readme)
+- [Contributing Guide](https://github.com/ayushm98/cascade/blob/main/CONTRIBUTING.md)
+## Architecture
+```
+Request → Cache Check → ML Classifier → Route to Model
+           ↓              ↓                ↓
+       Semantic       Simple/Med/      Ollama/GPT-4o-mini/
+         Cache        Complex          GPT-4o
+```
+## Tech Stack
+- **Backend**: FastAPI, Python 3.11+
+- **ML**: DistilBERT (ONNX), Sentence Transformers
+- **Caching**: Redis (exact match), Qdrant (semantic)
+- **UI**: Streamlit, Plotly
+- **Deployment**: Docker, Hugging Face Spaces
+---
+Built with ❤️ by [ayushm98](https://github.com/ayushm98)