docs: add Hugging Face Spaces README
Browse files- Add Space metadata (title, emoji, SDK config)
- Document features and architecture
- Provide setup instructions for environment variables
- README_SPACES.md +74 -0
README_SPACES.md
ADDED
|
@@ -0,0 +1,74 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Cascade - Intelligent LLM Router
|
| 3 |
+
emoji: π
|
| 4 |
+
colorFrom: purple
|
| 5 |
+
colorTo: blue
|
| 6 |
+
sdk: streamlit
|
| 7 |
+
sdk_version: "1.31.0"
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# Cascade π
|
| 13 |
+
|
| 14 |
+
**Intelligent LLM Request Router** - Reduce API costs by 60%+ through smart routing and semantic caching.
|
| 15 |
+
|
| 16 |
+
## What is Cascade?
|
| 17 |
+
|
| 18 |
+
Cascade is an intelligent proxy that automatically routes LLM requests to the most cost-effective model based on query complexity:
|
| 19 |
+
|
| 20 |
+
- **Simple queries** β Free local models (Ollama) or GPT-3.5
|
| 21 |
+
- **Medium queries** β GPT-4o-mini ($0.15/1M tokens)
|
| 22 |
+
- **Complex queries** β GPT-4o ($2.50/1M tokens)
|
| 23 |
+
|
| 24 |
+
## Features
|
| 25 |
+
|
| 26 |
+
- π― **ML-Powered Routing** - Predicts query complexity in <20ms
|
| 27 |
+
- π° **60%+ Cost Savings** - Routes simple queries to cheaper models
|
| 28 |
+
- β‘ **Semantic Caching** - Vector similarity search for cached responses
|
| 29 |
+
- π **Real-Time Analytics** - Dashboard showing savings and usage metrics
|
| 30 |
+
- π **OpenAI Compatible** - Drop-in replacement for OpenAI API
|
| 31 |
+
|
| 32 |
+
## Using This Space
|
| 33 |
+
|
| 34 |
+
This Space provides a demo UI where you can:
|
| 35 |
+
|
| 36 |
+
1. **Test Routing** - See how different queries get routed to different models
|
| 37 |
+
2. **View Analytics** - Track cost savings and cache hit rates
|
| 38 |
+
3. **Interactive Chat** - Try example queries and see real-time responses
|
| 39 |
+
|
| 40 |
+
## Configuration
|
| 41 |
+
|
| 42 |
+
To use this with real LLM APIs, set these environment variables:
|
| 43 |
+
|
| 44 |
+
- `OPENAI_API_KEY` - Your OpenAI API key
|
| 45 |
+
- `OLLAMA_BASE_URL` - Ollama server URL (optional)
|
| 46 |
+
- `REDIS_URL` - Redis connection for caching (optional)
|
| 47 |
+
- `QDRANT_URL` - Qdrant server for semantic cache (optional)
|
| 48 |
+
|
| 49 |
+
## Learn More
|
| 50 |
+
|
| 51 |
+
- [GitHub Repository](https://github.com/ayushm98/cascade)
|
| 52 |
+
- [Documentation](https://github.com/ayushm98/cascade#readme)
|
| 53 |
+
- [Contributing Guide](https://github.com/ayushm98/cascade/blob/main/CONTRIBUTING.md)
|
| 54 |
+
|
| 55 |
+
## Architecture
|
| 56 |
+
|
| 57 |
+
```
|
| 58 |
+
Request β Cache Check β ML Classifier β Route to Model
|
| 59 |
+
β β β
|
| 60 |
+
Semantic Simple/Med/ Ollama/GPT-4o-mini/
|
| 61 |
+
Cache Complex GPT-4o
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
## Tech Stack
|
| 65 |
+
|
| 66 |
+
- **Backend**: FastAPI, Python 3.11+
|
| 67 |
+
- **ML**: DistilBERT (ONNX), Sentence Transformers
|
| 68 |
+
- **Caching**: Redis (exact match), Qdrant (semantic)
|
| 69 |
+
- **UI**: Streamlit, Plotly
|
| 70 |
+
- **Deployment**: Docker, Hugging Face Spaces
|
| 71 |
+
|
| 72 |
+
---
|
| 73 |
+
|
| 74 |
+
Built with β€οΈ by [ayushm98](https://github.com/ayushm98)
|