ayushm98 commited on
Commit
1f9ceed
Β·
1 Parent(s): 7b1b1fd

docs: add Hugging Face Spaces README

Browse files

- Add Space metadata (title, emoji, SDK config)
- Document features and architecture
- Provide setup instructions for environment variables

Files changed (1) hide show
  1. README_SPACES.md +74 -0
README_SPACES.md ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Cascade - Intelligent LLM Router
3
+ emoji: 🌊
4
+ colorFrom: purple
5
+ colorTo: blue
6
+ sdk: streamlit
7
+ sdk_version: "1.31.0"
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
11
+
12
+ # Cascade 🌊
13
+
14
+ **Intelligent LLM Request Router** - Reduce API costs by 60%+ through smart routing and semantic caching.
15
+
16
+ ## What is Cascade?
17
+
18
+ Cascade is an intelligent proxy that automatically routes LLM requests to the most cost-effective model based on query complexity:
19
+
20
+ - **Simple queries** β†’ Free local models (Ollama) or GPT-3.5
21
+ - **Medium queries** β†’ GPT-4o-mini ($0.15/1M tokens)
22
+ - **Complex queries** β†’ GPT-4o ($2.50/1M tokens)
23
+
24
+ ## Features
25
+
26
+ - 🎯 **ML-Powered Routing** - Predicts query complexity in <20ms
27
+ - πŸ’° **60%+ Cost Savings** - Routes simple queries to cheaper models
28
+ - ⚑ **Semantic Caching** - Vector similarity search for cached responses
29
+ - πŸ“Š **Real-Time Analytics** - Dashboard showing savings and usage metrics
30
+ - πŸ”Œ **OpenAI Compatible** - Drop-in replacement for OpenAI API
31
+
32
+ ## Using This Space
33
+
34
+ This Space provides a demo UI where you can:
35
+
36
+ 1. **Test Routing** - See how different queries get routed to different models
37
+ 2. **View Analytics** - Track cost savings and cache hit rates
38
+ 3. **Interactive Chat** - Try example queries and see real-time responses
39
+
40
+ ## Configuration
41
+
42
+ To use this with real LLM APIs, set these environment variables:
43
+
44
+ - `OPENAI_API_KEY` - Your OpenAI API key
45
+ - `OLLAMA_BASE_URL` - Ollama server URL (optional)
46
+ - `REDIS_URL` - Redis connection for caching (optional)
47
+ - `QDRANT_URL` - Qdrant server for semantic cache (optional)
48
+
49
+ ## Learn More
50
+
51
+ - [GitHub Repository](https://github.com/ayushm98/cascade)
52
+ - [Documentation](https://github.com/ayushm98/cascade#readme)
53
+ - [Contributing Guide](https://github.com/ayushm98/cascade/blob/main/CONTRIBUTING.md)
54
+
55
+ ## Architecture
56
+
57
+ ```
58
+ Request β†’ Cache Check β†’ ML Classifier β†’ Route to Model
59
+ ↓ ↓ ↓
60
+ Semantic Simple/Med/ Ollama/GPT-4o-mini/
61
+ Cache Complex GPT-4o
62
+ ```
63
+
64
+ ## Tech Stack
65
+
66
+ - **Backend**: FastAPI, Python 3.11+
67
+ - **ML**: DistilBERT (ONNX), Sentence Transformers
68
+ - **Caching**: Redis (exact match), Qdrant (semantic)
69
+ - **UI**: Streamlit, Plotly
70
+ - **Deployment**: Docker, Hugging Face Spaces
71
+
72
+ ---
73
+
74
+ Built with ❀️ by [ayushm98](https://github.com/ayushm98)