cascade / README_SPACES.md
ayushm98's picture
docs: add Hugging Face Spaces README
1f9ceed
metadata
title: Cascade - Intelligent LLM Router
emoji: 🌊
colorFrom: purple
colorTo: blue
sdk: streamlit
sdk_version: 1.31.0
app_file: app.py
pinned: false

Cascade 🌊

Intelligent LLM Request Router - Reduce API costs by 60%+ through smart routing and semantic caching.

What is Cascade?

Cascade is an intelligent proxy that automatically routes LLM requests to the most cost-effective model based on query complexity:

  • Simple queries β†’ Free local models (Ollama) or GPT-3.5
  • Medium queries β†’ GPT-4o-mini ($0.15/1M tokens)
  • Complex queries β†’ GPT-4o ($2.50/1M tokens)

Features

  • 🎯 ML-Powered Routing - Predicts query complexity in <20ms
  • πŸ’° 60%+ Cost Savings - Routes simple queries to cheaper models
  • ⚑ Semantic Caching - Vector similarity search for cached responses
  • πŸ“Š Real-Time Analytics - Dashboard showing savings and usage metrics
  • πŸ”Œ OpenAI Compatible - Drop-in replacement for OpenAI API

Using This Space

This Space provides a demo UI where you can:

  1. Test Routing - See how different queries get routed to different models
  2. View Analytics - Track cost savings and cache hit rates
  3. Interactive Chat - Try example queries and see real-time responses

Configuration

To use this with real LLM APIs, set these environment variables:

  • OPENAI_API_KEY - Your OpenAI API key
  • OLLAMA_BASE_URL - Ollama server URL (optional)
  • REDIS_URL - Redis connection for caching (optional)
  • QDRANT_URL - Qdrant server for semantic cache (optional)

Learn More

Architecture

Request β†’ Cache Check β†’ ML Classifier β†’ Route to Model
           ↓              ↓                ↓
       Semantic       Simple/Med/      Ollama/GPT-4o-mini/
         Cache        Complex          GPT-4o

Tech Stack

  • Backend: FastAPI, Python 3.11+
  • ML: DistilBERT (ONNX), Sentence Transformers
  • Caching: Redis (exact match), Qdrant (semantic)
  • UI: Streamlit, Plotly
  • Deployment: Docker, Hugging Face Spaces

Built with ❀️ by ayushm98