Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.12.0
metadata
title: RAG LangGraph Chatbot
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
python_version: '3.10'
RAG-Based Chatbot (LangGraph + Hugging Face)
This project implements a RAG (Retrieval-Augmented Generation) chatbot that answers with either:
- Hugging Face router (when you provide an HF token and a router-available model; default
HF_MODEL_ID:meta-llama/Meta-Llama-3-8B-Instruct), or - Local transformers generation (no token; fallback
LOCAL_MODEL_ID:distilgpt2by default β quality is limited; set a stronger local model if you need better offline answers).
Features
- RAG Pipeline: Ingests, chunks, embeds, and indexes PDF documents for accurate retrieval.
- Inference Flexibility: Uses HF router when a token is provided; falls back to local transformers otherwise.
- LangGraph Agent: Retrieval + generation flow is orchestrated with LangGraph for clearer state handling.
- Gradio Interface: A user-friendly chat UI for interacting with the assistant.
- Modular Design: Clean separation of concerns (Ingestion, Vector Store, Agent, App).
Project Structure
rag_agent_project/
ββ app.py # Gradio application
ββ requirements.txt # Dependencies
ββ data/ # Data storage (PDFs, Index)
ββ src/ # Source code
β ββ ingestion.py # Data processing
β ββ vectorstore.py # Embedding & Indexing
β ββ rag_tool.py # (legacy) retriever tool helper
β ββ agent.py # RAG + HF router/local agent
β ββ config.py # Configuration
ββ tests/ # Automated tests
Setup & Usage
Install Dependencies:
pip install -r requirements.txtConfigure (optional):
- Set
HUGGINGFACEHUB_API_TOKENfor router inference. - Override
HF_MODEL_IDfor router (default:meta-llama/Meta-Llama-3-8B-Instruct). - Override
LOCAL_MODEL_IDfor local fallback (default:distilgpt2; use a stronger local model if you need better offline answers).
- Set
Run the Application:
python app.pyInteract:
- Open the provided local URL (usually
http://127.0.0.1:7860). - (Optional) Provide a Hugging Face token and router-supported model ID for cloud inference (default:
meta-llama/Meta-Llama-3-8B-Instruct). - Without a token, the app uses a local fallback model (
LOCAL_MODEL_ID, default:distilgpt2; quality is limitedβuse router + token for good answers or set a stronger local model). - Upload a PDF and click "Initialize System".
- Start chatting!
- Open the provided local URL (usually
Deployment (Hugging Face Spaces)
- Create a new Space on Hugging Face (SDK: Gradio).
- Upload the contents of
rag_agent_projectto the Space. - Ensure
requirements.txtis present. - The app will build and launch automatically.
Technical Details
- LLM: HF router (with token, default
meta-llama/Meta-Llama-3-8B-Instruct) or local transformers fallback (LOCAL_MODEL_ID, defaultdistilgpt2; change to a stronger model if running locally). - Embeddings: sentence-transformers/all-MiniLM-L6-v2
- Vector Store: FAISS
- Orchestration: LangGraph (retrieve β generate) RAG prompt with retrieval context
Notes for Hugging Face Spaces
- Add your
HUGGINGFACEHUB_API_TOKENas a secret for router usage. - If you want to pin a different router model, set
HF_MODEL_IDin the Space variables. OverrideLOCAL_MODEL_IDif you want a specific offline fallback. - The
data/folder is persisted for uploads and FAISS index; it is git-ignored here but created at runtime. - Entry point is
app.py;demo.queue().launch()is enabled for Spaces concurrency.