Spaces:

ayushm98
/

cascade

Sleeping

App Files Files Community

cascade / README_SPACES.md

ayushm98

docs: add Hugging Face Spaces README

1f9ceed 19 days ago

preview code

raw

history blame contribute delete

2.32 kB

metadata

title: Cascade - Intelligent LLM Router
emoji: 🌊
colorFrom: purple
colorTo: blue
sdk: streamlit
sdk_version: 1.31.0
app_file: app.py
pinned: false

Cascade 🌊

Intelligent LLM Request Router - Reduce API costs by 60%+ through smart routing and semantic caching.

What is Cascade?

Cascade is an intelligent proxy that automatically routes LLM requests to the most cost-effective model based on query complexity:

Simple queries → Free local models (Ollama) or GPT-3.5
Medium queries → GPT-4o-mini ($0.15/1M tokens)
Complex queries → GPT-4o ($2.50/1M tokens)

Features

🎯 ML-Powered Routing - Predicts query complexity in <20ms
💰 60%+ Cost Savings - Routes simple queries to cheaper models
⚡ Semantic Caching - Vector similarity search for cached responses
📊 Real-Time Analytics - Dashboard showing savings and usage metrics
🔌 OpenAI Compatible - Drop-in replacement for OpenAI API

Using This Space

This Space provides a demo UI where you can:

Test Routing - See how different queries get routed to different models
View Analytics - Track cost savings and cache hit rates
Interactive Chat - Try example queries and see real-time responses

Configuration

To use this with real LLM APIs, set these environment variables:

OPENAI_API_KEY - Your OpenAI API key
OLLAMA_BASE_URL - Ollama server URL (optional)
REDIS_URL - Redis connection for caching (optional)
QDRANT_URL - Qdrant server for semantic cache (optional)

Learn More

Architecture

Request → Cache Check → ML Classifier → Route to Model
           ↓              ↓                ↓
       Semantic       Simple/Med/      Ollama/GPT-4o-mini/
         Cache        Complex          GPT-4o

Tech Stack

Backend: FastAPI, Python 3.11+
ML: DistilBERT (ONNX), Sentence Transformers
Caching: Redis (exact match), Qdrant (semantic)
UI: Streamlit, Plotly
Deployment: Docker, Hugging Face Spaces

Built with ❤️ by ayushm98