Spaces:
Sleeping
Sleeping
metadata
title: HuggingFace Course RAG API
emoji: π€
colorFrom: yellow
colorTo: purple
sdk: docker
app_port: 7860
HuggingFace Course RAG
A multimodal Retrieval-Augmented Generation system over the Hugging Face Learn ecosystem. Ask questions about course content β text, code, and images β and get cited answers grounded in the official learning material.
Live Demo
- Frontend: multimodal-rag.vercel.app
- Backend API: abhiyanta-multimodal-rag.hf.space
- API Docs: abhiyanta-multimodal-rag.hf.space/docs
Features
- Multimodal Search β Dense retrieval over text and image embeddings using Qdrant
- Real-time Streaming β Token-by-token answer streaming via Server-Sent Events (SSE)
- Numbered Citations β Every claim is backed by short clickable references (e.g. [1], [2]) linking to the original course material
- Conversational Memory β Follow-up questions are automatically rewritten into standalone queries using conversation history
- Course Filtering β Scope your search to a specific course via filter pills
- LLM Fallback β Gemini 2.5 Flash as primary, Groq Llama 3.3 70B as automatic fallback
Courses Indexed
| Course | Source |
|---|---|
| Agents Course | huggingface/agents-course |
| Smol Course | huggingface/smol-course |
| Deep RL Course | huggingface/deep-rl-class |
| Audio Course | huggingface/audio-transformers-course |
| NLP Course | huggingface/course |
| Diffusion Course | huggingface/diffusion-models-class |
| LLM Course | huggingface/llm-course |
| Transformers Course | huggingface/transformers-course |
Architecture
ββββββββββββββββββββββββ Ingestion Pipeline ββββββββββββββββββββββββ
β β
β Scraper (GitHub) β Parser (Markdown) β Semantic Chunker β
β β β
β ββββββββββββ΄βββββββββββ β
β βΌ βΌ β
β BGE Embeddings CLIP Embeddings β
β (text, 384d) (images, 512d) β
β ββββββββββββ¬βββββββββββ β
β βΌ β
β Qdrant Cloud β
β (single collection, β
β named vectors) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββ Query Pipeline ββββββββββββββββββββββββββββ
β β
β User β Next.js Frontend (Vercel) β
β β β
β βΌ β
β FastAPI Backend (HF Spaces) β
β β β
β Query Rewrite (conversation-aware) β
β β β
β Dense Search (BGE text + CLIP image vectors) β
β β β
β Merge & Deduplicate Results β
β β β
β Gemini 2.5 Flash (or Groq fallback) β
β β β
β SSE Stream β Cited Answer with [1], [2] references β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Tech Stack
| Layer | Technology |
|---|---|
| Frontend | Next.js 16, React 19, TypeScript, Tailwind CSS v4 |
| Backend | FastAPI, SSE streaming (sse-starlette), Pydantic |
| Vector DB | Qdrant Cloud (free tier, HNSW index) |
| Text Embeddings | BAAI/bge-small-en-v1.5 (384 dims) |
| Image Embeddings | openai/clip-vit-base-patch32 (512 dims) |
| Primary LLM | Google Gemini 2.5 Flash |
| Fallback LLM | Llama 3.3 70B via Groq API |
| Frontend Hosting | Vercel |
| Backend Hosting | HuggingFace Spaces (Docker) |
Project Structure
βββ scraper/ # Course content fetching from GitHub repos
βββ parser/ # Markdown chunking by heading structure (h2/h3)
βββ embedding/ # BGE (text) + CLIP (image) embedding logic
βββ ingestion/ # Qdrant collection setup and data loading
βββ retrieval/ # Dense search, merge, and ranking
βββ generation/ # LLM prompting, streaming, fallback chain
βββ memory/ # Conversation history and query rewriting
βββ backend/ # FastAPI application
βββ frontend/ # Next.js chat interface
β βββ app/
β β βββ page.tsx # Main chat page with SSE streaming
β β βββ layout.tsx # Root layout
β β βββ globals.css # Dark theme styles
β β βββ components/
β β βββ ChatMessage.tsx # Message rendering with markdown + citations
β β βββ ChatInput.tsx # Input with course filter pills
β β βββ Sidebar.tsx # Session management
β βββ next.config.ts
β βββ vercel.json
β βββ package.json
βββ evaluation/ # Test set and metrics
βββ data/ # Raw scraped content and processed chunks
βββ config.py # Centralized settings via pydantic-settings
βββ run.py # Uvicorn entry point
βββ Dockerfile # HuggingFace Spaces deployment
βββ requirements.txt # Full Python dependencies
βββ requirements-deploy.txt # Lightweight deploy dependencies (no scraping libs)
Getting Started
Prerequisites
- Python 3.11+
- Node.js 18+
- API keys: Gemini, Groq, Qdrant Cloud
1. Clone the repo
git clone https://github.com/mansh7763/multimodal-rag.git
cd multimodal-rag
2. Set up Python environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
3. Configure environment variables
cp .env.example .env
Edit .env with your keys:
GEMINI_API_KEY=your_gemini_key
GROQ_API_KEY=your_groq_key
QDRANT_URL=https://your-cluster.qdrant.io
QDRANT_API_KEY=your_qdrant_key
4. Ingest data
# Scrape course content from GitHub repos
python -m scraper.crawler
# Parse markdown and chunk by headings
python -m parser.chunker
# Generate embeddings and load into Qdrant
python -m ingestion.ingest
5. Run the backend
python run.py
Backend runs at http://localhost:8000. Interactive API docs at http://localhost:8000/docs.
6. Run the frontend
cd frontend
npm install
npm run dev
Frontend runs at http://localhost:3000.
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
| GET | / |
API info |
| POST | /query |
Non-streaming query β returns full answer + sources |
| POST | /query/stream |
Streaming query via SSE β real-time token streaming |
| GET | /courses |
List all indexed courses |
| POST | /session/{id}/clear |
Clear conversation history for a session |
| GET | /health |
Health check with Qdrant connection status |
Example request
curl -X POST https://abhiyanta-multimodal-rag.hf.space/query/stream \
-H "Content-Type: application/json" \
-d '{"query": "How does LoRA work?", "session_id": "test"}'
Deployment
Backend β HuggingFace Spaces
The backend is deployed as a Docker Space on HuggingFace:
- Create a new Space at huggingface.co/new-space with Docker SDK (Blank template)
- Add secrets in Settings > Variables and Secrets:
GEMINI_API_KEYGROQ_API_KEYQDRANT_URLQDRANT_API_KEY
- Push code to the Space:
git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE git push hf master:main
Frontend β Vercel
The frontend is deployed on Vercel:
- Import the GitHub repo at vercel.com/new
- Set Root Directory to
frontend - Add environment variable:
NEXT_PUBLIC_BACKEND_URL=https://YOUR_USERNAME-YOUR_SPACE.hf.space
- Deploy β Vercel auto-detects Next.js
After deployment
- Backend API:
https://YOUR_USERNAME-YOUR_SPACE.hf.space - Frontend:
https://your-project.vercel.app - The HF Spaces free tier may have cold starts (~30s) after inactivity
License
MIT