multimodal_rag / README.md
mansh
Rewrite README with full documentation, architecture, and deployment guide
0cba451
metadata
title: HuggingFace Course RAG API
emoji: πŸ€—
colorFrom: yellow
colorTo: purple
sdk: docker
app_port: 7860

HuggingFace Course RAG

A multimodal Retrieval-Augmented Generation system over the Hugging Face Learn ecosystem. Ask questions about course content β€” text, code, and images β€” and get cited answers grounded in the official learning material.

Live Demo

Features

  • Multimodal Search β€” Dense retrieval over text and image embeddings using Qdrant
  • Real-time Streaming β€” Token-by-token answer streaming via Server-Sent Events (SSE)
  • Numbered Citations β€” Every claim is backed by short clickable references (e.g. [1], [2]) linking to the original course material
  • Conversational Memory β€” Follow-up questions are automatically rewritten into standalone queries using conversation history
  • Course Filtering β€” Scope your search to a specific course via filter pills
  • LLM Fallback β€” Gemini 2.5 Flash as primary, Groq Llama 3.3 70B as automatic fallback

Courses Indexed

Course Source
Agents Course huggingface/agents-course
Smol Course huggingface/smol-course
Deep RL Course huggingface/deep-rl-class
Audio Course huggingface/audio-transformers-course
NLP Course huggingface/course
Diffusion Course huggingface/diffusion-models-class
LLM Course huggingface/llm-course
Transformers Course huggingface/transformers-course

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ Ingestion Pipeline ───────────────────────┐
β”‚                                                                  β”‚
β”‚  Scraper (GitHub) β†’ Parser (Markdown) β†’ Semantic Chunker         β”‚
β”‚                                          β”‚                       β”‚
β”‚                               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚
β”‚                               β–Ό                     β–Ό           β”‚
β”‚                        BGE Embeddings         CLIP Embeddings    β”‚
β”‚                        (text, 384d)          (images, 512d)      β”‚
β”‚                               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚
β”‚                                          β–Ό                       β”‚
β”‚                                    Qdrant Cloud                  β”‚
β”‚                              (single collection,                 β”‚
β”‚                               named vectors)                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ Query Pipeline ───────────────────────────┐
β”‚                                                                  β”‚
β”‚  User β†’ Next.js Frontend (Vercel)                                β”‚
β”‚              β”‚                                                   β”‚
β”‚              β–Ό                                                   β”‚
β”‚         FastAPI Backend (HF Spaces)                              β”‚
β”‚              β”‚                                                   β”‚
β”‚         Query Rewrite (conversation-aware)                       β”‚
β”‚              β”‚                                                   β”‚
β”‚         Dense Search (BGE text + CLIP image vectors)             β”‚
β”‚              β”‚                                                   β”‚
β”‚         Merge & Deduplicate Results                              β”‚
β”‚              β”‚                                                   β”‚
β”‚         Gemini 2.5 Flash (or Groq fallback)                      β”‚
β”‚              β”‚                                                   β”‚
β”‚         SSE Stream β†’ Cited Answer with [1], [2] references       β”‚
β”‚                                                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Tech Stack

Layer Technology
Frontend Next.js 16, React 19, TypeScript, Tailwind CSS v4
Backend FastAPI, SSE streaming (sse-starlette), Pydantic
Vector DB Qdrant Cloud (free tier, HNSW index)
Text Embeddings BAAI/bge-small-en-v1.5 (384 dims)
Image Embeddings openai/clip-vit-base-patch32 (512 dims)
Primary LLM Google Gemini 2.5 Flash
Fallback LLM Llama 3.3 70B via Groq API
Frontend Hosting Vercel
Backend Hosting HuggingFace Spaces (Docker)

Project Structure

β”œβ”€β”€ scraper/          # Course content fetching from GitHub repos
β”œβ”€β”€ parser/           # Markdown chunking by heading structure (h2/h3)
β”œβ”€β”€ embedding/        # BGE (text) + CLIP (image) embedding logic
β”œβ”€β”€ ingestion/        # Qdrant collection setup and data loading
β”œβ”€β”€ retrieval/        # Dense search, merge, and ranking
β”œβ”€β”€ generation/       # LLM prompting, streaming, fallback chain
β”œβ”€β”€ memory/           # Conversation history and query rewriting
β”œβ”€β”€ backend/          # FastAPI application
β”œβ”€β”€ frontend/         # Next.js chat interface
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ page.tsx              # Main chat page with SSE streaming
β”‚   β”‚   β”œβ”€β”€ layout.tsx            # Root layout
β”‚   β”‚   β”œβ”€β”€ globals.css           # Dark theme styles
β”‚   β”‚   └── components/
β”‚   β”‚       β”œβ”€β”€ ChatMessage.tsx   # Message rendering with markdown + citations
β”‚   β”‚       β”œβ”€β”€ ChatInput.tsx     # Input with course filter pills
β”‚   β”‚       └── Sidebar.tsx       # Session management
β”‚   β”œβ”€β”€ next.config.ts
β”‚   β”œβ”€β”€ vercel.json
β”‚   └── package.json
β”œβ”€β”€ evaluation/       # Test set and metrics
β”œβ”€β”€ data/             # Raw scraped content and processed chunks
β”œβ”€β”€ config.py         # Centralized settings via pydantic-settings
β”œβ”€β”€ run.py            # Uvicorn entry point
β”œβ”€β”€ Dockerfile        # HuggingFace Spaces deployment
β”œβ”€β”€ requirements.txt  # Full Python dependencies
└── requirements-deploy.txt  # Lightweight deploy dependencies (no scraping libs)

Getting Started

Prerequisites

1. Clone the repo

git clone https://github.com/mansh7763/multimodal-rag.git
cd multimodal-rag

2. Set up Python environment

python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

3. Configure environment variables

cp .env.example .env

Edit .env with your keys:

GEMINI_API_KEY=your_gemini_key
GROQ_API_KEY=your_groq_key
QDRANT_URL=https://your-cluster.qdrant.io
QDRANT_API_KEY=your_qdrant_key

4. Ingest data

# Scrape course content from GitHub repos
python -m scraper.crawler

# Parse markdown and chunk by headings
python -m parser.chunker

# Generate embeddings and load into Qdrant
python -m ingestion.ingest

5. Run the backend

python run.py

Backend runs at http://localhost:8000. Interactive API docs at http://localhost:8000/docs.

6. Run the frontend

cd frontend
npm install
npm run dev

Frontend runs at http://localhost:3000.

API Endpoints

Method Endpoint Description
GET / API info
POST /query Non-streaming query β€” returns full answer + sources
POST /query/stream Streaming query via SSE β€” real-time token streaming
GET /courses List all indexed courses
POST /session/{id}/clear Clear conversation history for a session
GET /health Health check with Qdrant connection status

Example request

curl -X POST https://abhiyanta-multimodal-rag.hf.space/query/stream \
  -H "Content-Type: application/json" \
  -d '{"query": "How does LoRA work?", "session_id": "test"}'

Deployment

Backend β€” HuggingFace Spaces

The backend is deployed as a Docker Space on HuggingFace:

  1. Create a new Space at huggingface.co/new-space with Docker SDK (Blank template)
  2. Add secrets in Settings > Variables and Secrets:
    • GEMINI_API_KEY
    • GROQ_API_KEY
    • QDRANT_URL
    • QDRANT_API_KEY
  3. Push code to the Space:
    git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE
    git push hf master:main
    

Frontend β€” Vercel

The frontend is deployed on Vercel:

  1. Import the GitHub repo at vercel.com/new
  2. Set Root Directory to frontend
  3. Add environment variable:
    • NEXT_PUBLIC_BACKEND_URL = https://YOUR_USERNAME-YOUR_SPACE.hf.space
  4. Deploy β€” Vercel auto-detects Next.js

After deployment

  • Backend API: https://YOUR_USERNAME-YOUR_SPACE.hf.space
  • Frontend: https://your-project.vercel.app
  • The HF Spaces free tier may have cold starts (~30s) after inactivity

License

MIT