RAG_LangGraph / README.md
Anandharajan's picture
Quote python_version for HF Space
1affecb

A newer version of the Gradio SDK is available: 6.12.0

Upgrade
metadata
title: RAG LangGraph Chatbot
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
python_version: '3.10'

RAG-Based Chatbot (LangGraph + Hugging Face)

This project implements a RAG (Retrieval-Augmented Generation) chatbot that answers with either:

  • Hugging Face router (when you provide an HF token and a router-available model; default HF_MODEL_ID: meta-llama/Meta-Llama-3-8B-Instruct), or
  • Local transformers generation (no token; fallback LOCAL_MODEL_ID: distilgpt2 by default β€” quality is limited; set a stronger local model if you need better offline answers).

Features

  • RAG Pipeline: Ingests, chunks, embeds, and indexes PDF documents for accurate retrieval.
  • Inference Flexibility: Uses HF router when a token is provided; falls back to local transformers otherwise.
  • LangGraph Agent: Retrieval + generation flow is orchestrated with LangGraph for clearer state handling.
  • Gradio Interface: A user-friendly chat UI for interacting with the assistant.
  • Modular Design: Clean separation of concerns (Ingestion, Vector Store, Agent, App).

Project Structure

rag_agent_project/
β”œβ”€ app.py              # Gradio application
β”œβ”€ requirements.txt    # Dependencies
β”œβ”€ data/               # Data storage (PDFs, Index)
β”œβ”€ src/                # Source code
β”‚  β”œβ”€ ingestion.py     # Data processing
β”‚  β”œβ”€ vectorstore.py   # Embedding & Indexing
β”‚  β”œβ”€ rag_tool.py      # (legacy) retriever tool helper
β”‚  β”œβ”€ agent.py         # RAG + HF router/local agent
β”‚  └─ config.py        # Configuration
└─ tests/              # Automated tests

Setup & Usage

  1. Install Dependencies:

    pip install -r requirements.txt
    
  2. Configure (optional):

    • Set HUGGINGFACEHUB_API_TOKEN for router inference.
    • Override HF_MODEL_ID for router (default: meta-llama/Meta-Llama-3-8B-Instruct).
    • Override LOCAL_MODEL_ID for local fallback (default: distilgpt2; use a stronger local model if you need better offline answers).
  3. Run the Application:

    python app.py
    
  4. Interact:

    • Open the provided local URL (usually http://127.0.0.1:7860).
    • (Optional) Provide a Hugging Face token and router-supported model ID for cloud inference (default: meta-llama/Meta-Llama-3-8B-Instruct).
    • Without a token, the app uses a local fallback model (LOCAL_MODEL_ID, default: distilgpt2; quality is limitedβ€”use router + token for good answers or set a stronger local model).
    • Upload a PDF and click "Initialize System".
    • Start chatting!

Deployment (Hugging Face Spaces)

  1. Create a new Space on Hugging Face (SDK: Gradio).
  2. Upload the contents of rag_agent_project to the Space.
  3. Ensure requirements.txt is present.
  4. The app will build and launch automatically.

Technical Details

  • LLM: HF router (with token, default meta-llama/Meta-Llama-3-8B-Instruct) or local transformers fallback (LOCAL_MODEL_ID, default distilgpt2; change to a stronger model if running locally).
  • Embeddings: sentence-transformers/all-MiniLM-L6-v2
  • Vector Store: FAISS
  • Orchestration: LangGraph (retrieve β†’ generate) RAG prompt with retrieval context

Notes for Hugging Face Spaces

  • Add your HUGGINGFACEHUB_API_TOKEN as a secret for router usage.
  • If you want to pin a different router model, set HF_MODEL_ID in the Space variables. Override LOCAL_MODEL_ID if you want a specific offline fallback.
  • The data/ folder is persisted for uploads and FAISS index; it is git-ignored here but created at runtime.
  • Entry point is app.py; demo.queue().launch() is enabled for Spaces concurrency.