Spaces:

ayushm98
/

cascade

Sleeping

App Files Files Community

cascade / README_SPACES.md

ayushm98

docs: add Hugging Face Spaces README

1f9ceed 20 days ago

preview code

raw

history blame contribute delete

2.32 kB

	---
	title: Cascade - Intelligent LLM Router
	emoji: 🌊
	colorFrom: purple
	colorTo: blue
	sdk: streamlit
	sdk_version: "1.31.0"
	app_file: app.py
	pinned: false
	---

	# Cascade 🌊

	Intelligent LLM Request Router - Reduce API costs by 60%+ through smart routing and semantic caching.

	## What is Cascade?

	Cascade is an intelligent proxy that automatically routes LLM requests to the most cost-effective model based on query complexity:

	- Simple queries → Free local models (Ollama) or GPT-3.5
	- Medium queries → GPT-4o-mini ($0.15/1M tokens)
	- Complex queries → GPT-4o ($2.50/1M tokens)

	## Features

	- 🎯 ML-Powered Routing - Predicts query complexity in <20ms
	- 💰 60%+ Cost Savings - Routes simple queries to cheaper models
	- ⚡ Semantic Caching - Vector similarity search for cached responses
	- 📊 Real-Time Analytics - Dashboard showing savings and usage metrics
	- 🔌 OpenAI Compatible - Drop-in replacement for OpenAI API

	## Using This Space

	This Space provides a demo UI where you can:

	1. Test Routing - See how different queries get routed to different models
	2. View Analytics - Track cost savings and cache hit rates
	3. Interactive Chat - Try example queries and see real-time responses

	## Configuration

	To use this with real LLM APIs, set these environment variables:

	- `OPENAI_API_KEY` - Your OpenAI API key
	- `OLLAMA_BASE_URL` - Ollama server URL (optional)
	- `REDIS_URL` - Redis connection for caching (optional)
	- `QDRANT_URL` - Qdrant server for semantic cache (optional)

	## Learn More

	- [GitHub Repository](https://github.com/ayushm98/cascade)
	- [Documentation](https://github.com/ayushm98/cascade#readme)
	- [Contributing Guide](https://github.com/ayushm98/cascade/blob/main/CONTRIBUTING.md)

	## Architecture

	```
	Request → Cache Check → ML Classifier → Route to Model
	↓ ↓ ↓
	Semantic Simple/Med/ Ollama/GPT-4o-mini/
	Cache Complex GPT-4o
	```

	## Tech Stack

	- Backend: FastAPI, Python 3.11+
	- ML: DistilBERT (ONNX), Sentence Transformers
	- Caching: Redis (exact match), Qdrant (semantic)
	- UI: Streamlit, Plotly
	- Deployment: Docker, Hugging Face Spaces

	---

	Built with ❤️ by [ayushm98](https://github.com/ayushm98)