Spaces:

Anandharajan
/

RAG_LangGraph

Sleeping

App Files Files Community

Anandharajan commited on Nov 27, 2025

Commit

99f19b3

1 Parent(s): 12e4ce3

Sync Space with LangGraph RAG app

Browse files

Files changed (14) hide show

.gitattributes +0 -35
.github/workflows/deploy-space.yml +19 -0
.gitignore +11 -0
README.md +69 -16
app.py +174 -64
data/README.md +2 -0
requirements.txt +13 -0
src/__init__.py +0 -0
src/agent.py +240 -0
src/config.py +27 -0
src/ingestion.py +33 -0
src/rag_tool.py +20 -0
src/vectorstore.py +33 -0
tests/test_pipeline.py +48 -0

.gitattributes DELETED Viewed

@@ -1,35 +0,0 @@
-*.7z filter=lfs diff=lfs merge=lfs -text
-*.arrow filter=lfs diff=lfs merge=lfs -text
-*.bin filter=lfs diff=lfs merge=lfs -text
-*.bz2 filter=lfs diff=lfs merge=lfs -text
-*.ckpt filter=lfs diff=lfs merge=lfs -text
-*.ftz filter=lfs diff=lfs merge=lfs -text
-*.gz filter=lfs diff=lfs merge=lfs -text
-*.h5 filter=lfs diff=lfs merge=lfs -text
-*.joblib filter=lfs diff=lfs merge=lfs -text
-*.lfs.* filter=lfs diff=lfs merge=lfs -text
-*.mlmodel filter=lfs diff=lfs merge=lfs -text
-*.model filter=lfs diff=lfs merge=lfs -text
-*.msgpack filter=lfs diff=lfs merge=lfs -text
-*.npy filter=lfs diff=lfs merge=lfs -text
-*.npz filter=lfs diff=lfs merge=lfs -text
-*.onnx filter=lfs diff=lfs merge=lfs -text
-*.ot filter=lfs diff=lfs merge=lfs -text
-*.parquet filter=lfs diff=lfs merge=lfs -text
-*.pb filter=lfs diff=lfs merge=lfs -text
-*.pickle filter=lfs diff=lfs merge=lfs -text
-*.pkl filter=lfs diff=lfs merge=lfs -text
-*.pt filter=lfs diff=lfs merge=lfs -text
-*.pth filter=lfs diff=lfs merge=lfs -text
-*.rar filter=lfs diff=lfs merge=lfs -text
-*.safetensors filter=lfs diff=lfs merge=lfs -text
-saved_model/**/* filter=lfs diff=lfs merge=lfs -text
-*.tar.* filter=lfs diff=lfs merge=lfs -text
-*.tar filter=lfs diff=lfs merge=lfs -text
-*.tflite filter=lfs diff=lfs merge=lfs -text
-*.tgz filter=lfs diff=lfs merge=lfs -text
-*.wasm filter=lfs diff=lfs merge=lfs -text
-*.xz filter=lfs diff=lfs merge=lfs -text
-*.zip filter=lfs diff=lfs merge=lfs -text
-*.zst filter=lfs diff=lfs merge=lfs -text
-*tfevents* filter=lfs diff=lfs merge=lfs -text

.github/workflows/deploy-space.yml ADDED Viewed

	@@ -0,0 +1,19 @@

+name: Deploy to Hugging Face Space
+on:
+  push:
+    branches: [ master ]
+    tags: [ 'v*' ]
+  workflow_dispatch:
+jobs:
+  sync-space:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Push to Hugging Face Space
+        uses: huggingface/hub-action@v1
+        with:
+          repo-token: ${{ secrets.HF_TOKEN }}
+          repo-id: ${{ secrets.HF_SPACE_ID }}
+          repo-type: space

.gitignore ADDED Viewed

	@@ -0,0 +1,11 @@

+__pycache__/
+*.py[cod]
+.DS_Store
+.venv/
+# Local artifacts
+data/source.pdf
+data/faiss_index/
+# Gradio upload temp files (just in case)
+tmp/

README.md CHANGED Viewed

@@ -1,16 +1,69 @@
----
-title: RAG LangGraph
-emoji: 💬
-colorFrom: yellow
-colorTo: purple
-sdk: gradio
-sdk_version: 5.42.0
-app_file: app.py
-pinned: false
-hf_oauth: true
-hf_oauth_scopes:
-- inference-api
-short_description: ' A LangGraph-powered RAG chatbot '
----
-An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).

+# RAG-Based Chatbot (LangGraph + Hugging Face)
+This project implements a RAG (Retrieval-Augmented Generation) chatbot that answers with either:
+- **Hugging Face router** (when you provide an HF token and a router-available model; default `HF_MODEL_ID`: `meta-llama/Meta-Llama-3-8B-Instruct`), or
+- **Local transformers generation** (no token; fallback `LOCAL_MODEL_ID`: `distilgpt2` by default — quality is limited; set a stronger local model if you need better offline answers).
+## Features
+- **RAG Pipeline**: Ingests, chunks, embeds, and indexes PDF documents for accurate retrieval.
+- **Inference Flexibility**: Uses HF router when a token is provided; falls back to local transformers otherwise.
+- **LangGraph Agent**: Retrieval + generation flow is orchestrated with LangGraph for clearer state handling.
+- **Gradio Interface**: A user-friendly chat UI for interacting with the assistant.
+- **Modular Design**: Clean separation of concerns (Ingestion, Vector Store, Agent, App).
+## Project Structure
+```
+rag_agent_project/
+├─ app.py              # Gradio application
+├─ requirements.txt    # Dependencies
+├─ data/               # Data storage (PDFs, Index)
+├─ src/                # Source code
+│  ├─ ingestion.py     # Data processing
+│  ├─ vectorstore.py   # Embedding & Indexing
+│  ├─ rag_tool.py      # (legacy) retriever tool helper
+│  ├─ agent.py         # RAG + HF router/local agent
+│  └─ config.py        # Configuration
+└─ tests/              # Automated tests
+```
+## Setup & Usage
+1.  **Install Dependencies**:
+    ```bash
+    pip install -r requirements.txt
+    ```
+2.  **Configure (optional)**:
+    - Set `HUGGINGFACEHUB_API_TOKEN` for router inference.
+    - Override `HF_MODEL_ID` for router (default: `meta-llama/Meta-Llama-3-8B-Instruct`).
+    - Override `LOCAL_MODEL_ID` for local fallback (default: `distilgpt2`; use a stronger local model if you need better offline answers).
+3.  **Run the Application**:
+    ```bash
+    python app.py
+    ```
+4.  **Interact**:
+    - Open the provided local URL (usually `http://127.0.0.1:7860`).
+    - (Optional) Provide a Hugging Face token and router-supported model ID for cloud inference (default: `meta-llama/Meta-Llama-3-8B-Instruct`).
+    - Without a token, the app uses a local fallback model (`LOCAL_MODEL_ID`, default: `distilgpt2`; quality is limited—use router + token for good answers or set a stronger local model).
+    - Upload a PDF and click "Initialize System".
+    - Start chatting!
+## Deployment (Hugging Face Spaces)
+1.  Create a new Space on Hugging Face (SDK: Gradio).
+2.  Upload the contents of `rag_agent_project` to the Space.
+3.  Ensure `requirements.txt` is present.
+4.  The app will build and launch automatically.
+## Technical Details
+- **LLM**: HF router (with token, default `meta-llama/Meta-Llama-3-8B-Instruct`) or local transformers fallback (`LOCAL_MODEL_ID`, default `distilgpt2`; change to a stronger model if running locally).
+- **Embeddings**: sentence-transformers/all-MiniLM-L6-v2
+- **Vector Store**: FAISS
+- **Orchestration**: LangGraph (retrieve → generate) RAG prompt with retrieval context
+## Notes for Hugging Face Spaces
+- Add your `HUGGINGFACEHUB_API_TOKEN` as a secret for router usage.
+- If you want to pin a different router model, set `HF_MODEL_ID` in the Space variables. Override `LOCAL_MODEL_ID` if you want a specific offline fallback.
+- The `data/` folder is persisted for uploads and FAISS index; it is git-ignored here but created at runtime.
+- Entry point is `app.py`; `demo.queue().launch()` is enabled for Spaces concurrency.

app.py CHANGED Viewed

@@ -1,70 +1,180 @@
 import gradio as gr
-from huggingface_hub import InferenceClient
-def respond(
-    message,
-    history: list[dict[str, str]],
-    system_message,
-    max_tokens,
-    temperature,
-    top_p,
-    hf_token: gr.OAuthToken,
-):
     """
-    For more information on `huggingface_hub` Inference API support, please check the docs: https://huggingface.co/docs/huggingface_hub/v0.22.2/en/guides/inference
     """
-    client = InferenceClient(token=hf_token.token, model="openai/gpt-oss-20b")
-    messages = [{"role": "system", "content": system_message}]
-    messages.extend(history)
-    messages.append({"role": "user", "content": message})
-    response = ""
-    for message in client.chat_completion(
-        messages,
-        max_tokens=max_tokens,
-        stream=True,
-        temperature=temperature,
-        top_p=top_p,
-    ):
-        choices = message.choices
-        token = ""
-        if len(choices) and choices[0].delta.content:
-            token = choices[0].delta.content
-        response += token
-        yield response
-"""
-For information on how to customize the ChatInterface, peruse the gradio docs: https://www.gradio.app/docs/chatinterface
-"""
-chatbot = gr.ChatInterface(
-    respond,
-    type="messages",
-    additional_inputs=[
-        gr.Textbox(value="You are a friendly Chatbot.", label="System message"),
-        gr.Slider(minimum=1, maximum=2048, value=512, step=1, label="Max new tokens"),
-        gr.Slider(minimum=0.1, maximum=4.0, value=0.7, step=0.1, label="Temperature"),
-        gr.Slider(
-            minimum=0.1,
-            maximum=1.0,
-            value=0.95,
-            step=0.05,
-            label="Top-p (nucleus sampling)",
-        ),
-    ],
-)
-with gr.Blocks() as demo:
-    with gr.Sidebar():
-        gr.LoginButton()
-    chatbot.render()
 if __name__ == "__main__":
-    demo.launch()

 import gradio as gr
+import os
+import shutil
+from src.config import PDF_PATH, HF_API_TOKEN, HF_MODEL_ID, DATA_DIR
+from src.ingestion import ingest_file
+from src.vectorstore import create_vectorstore, load_vectorstore
+from src.agent import build_langgraph_agent
+from langchain_core.messages import HumanMessage
+# Global variables to store state
+vectorstore = None
+agent_executor = None
+current_hf_token = None
+current_hf_model = None
+# Ensure data directory exists for uploads and FAISS index (important for HF Spaces).
+os.makedirs(DATA_DIR, exist_ok=True)
+def _get_uploaded_path(uploaded_file):
+    """
+    Normalize Gradio's uploaded file into a filesystem path.
+    Handles filepath strings, temporary file objects, and dict payloads.
+    """
+    if uploaded_file is None:
+        return None
+    if isinstance(uploaded_file, (str, os.PathLike)):
+        return str(uploaded_file)
+    if isinstance(uploaded_file, dict):
+        return uploaded_file.get("name") or uploaded_file.get("path")
+    if hasattr(uploaded_file, "name"):
+        return uploaded_file.name
+    return None
+def initialize_system(hf_token, hf_model, uploaded_file):
     """
+    Initializes the RAG pipeline and Agent.
     """
+    global vectorstore, agent_executor, current_hf_token, current_hf_model
+    hf_token = (hf_token or HF_API_TOKEN or "").strip()
+    hf_model = (hf_model or HF_MODEL_ID).strip()
+    uploaded_path = _get_uploaded_path(uploaded_file)
+    if uploaded_file is not None and uploaded_path is None:
+        return "Could not read the uploaded file. Please try uploading again."
+    if uploaded_path is None and not os.path.exists(PDF_PATH):
+        return "Please upload a PDF file."
+    try:
+        # 0. Handle File Upload
+        if uploaded_path is not None:
+            # Gradio passes a temporary file path or a file object depending on version/config.
+            # Usually it's a named temp file path in recent versions.
+            # We copy it to our data directory.
+            if not os.path.exists(os.path.dirname(PDF_PATH)):
+                os.makedirs(os.path.dirname(PDF_PATH))
+            # uploaded_file is a file path in recent Gradio versions
+            shutil.copy(uploaded_path, PDF_PATH)
+            print(f"File saved to {PDF_PATH}")
+            # Force re-ingestion since we have a new file
+            print("Ingesting PDF...")
+            chunks = ingest_file(str(PDF_PATH))
+            vectorstore = create_vectorstore(chunks)
+        # 1. Load or Create Vector Store (if not already created above)
+        if vectorstore is None:
+            vectorstore = load_vectorstore()
+            if vectorstore is None:
+                # This case should be covered by the upload logic, but just in case
+                if os.path.exists(PDF_PATH):
+                    print("Ingesting PDF...")
+                    chunks = ingest_file(str(PDF_PATH))
+                    vectorstore = create_vectorstore(chunks)
+                else:
+                    return "Source PDF not found. Please upload a file."
+        # 2. Create Agent (LangGraph)
+        agent_executor = build_langgraph_agent(vectorstore, hf_api_token=hf_token, hf_model_id=hf_model)
+        current_hf_token = hf_token
+        current_hf_model = hf_model
+        mode = "Hugging Face router" if hf_token else "local transformers (no HF token provided)"
+        return f"System Initialized Successfully using {mode}. You can now start chatting."
+    except Exception as e:
+        import traceback
+        traceback.print_exc()
+        return f"Initialization Failed: {str(e)}"
+def chat(message, history, hf_token, hf_model, uploaded_file):
+    """
+    Chat function for Gradio.
+    """
+    global agent_executor, current_hf_token, current_hf_model
+    # Gradio can pass None for history on the first turn.
+    history = history or []
+    if not message:
+        return "Please enter a message to start chatting."
+    hf_token = (hf_token or HF_API_TOKEN or "").strip()
+    hf_model = (hf_model or HF_MODEL_ID).strip()
+    # Check if API key has changed or agent is not initialized
+    if agent_executor is None or hf_token != current_hf_token or hf_model != current_hf_model:
+        init_msg = initialize_system(hf_token, hf_model, uploaded_file)
+        if "Failed" in init_msg or "Please" in init_msg:
+            return init_msg
+    # Run the agent
+    try:
+        # Convert history to LangChain format if needed, but LangGraph handles state.
+        # We pass the full history + new message to the agent if we were managing state manually,
+        # but here we'll just pass the new message and let the graph handle it if we were persistent.
+        # For a simple chat interface without persistence, we pass the conversation history.
+        messages = []
+        for h in history:
+            messages.append(HumanMessage(content=h[0]))
+            # We would need AI message here too, but Gradio history is [user, bot].
+            # For simplicity in this demo, we'll just send the current message or a limited context.
+            # Let's send the current message. To support history, we'd need to map Gradio history to LangChain messages.
+        # Better approach for this demo: Just send the current message.
+        # The agent is stateless between calls in this simple implementation unless we use checkpointers.
+        response = agent_executor.invoke({"messages": [HumanMessage(content=message)]})
+        return response["messages"][-1].content
+    except Exception as e:
+        import traceback
+        traceback.print_exc()
+        hint = (
+            " If you used the Hugging Face router, verify the token/model. "
+            "Otherwise, try re-initializing to refresh the vector store."
+        )
+        return f"Error while generating a reply: {str(e)}{hint}"
+# Gradio UI
+with gr.Blocks(title="RAG Chatbot (LangGraph + HF)") as demo:
+    gr.Markdown("# RAG-Based Chatbot (LangGraph + Hugging Face)")
+    gr.Markdown(
+        "Upload a PDF, build a vector store, retrieve context, and answer with either the Hugging Face router "
+        "(when a token + router model is provided) or a local fallback model."
+    )
+    with gr.Row():
+        api_key_input = gr.Textbox(
+            label="Hugging Face API Token (optional)",
+            type="password",
+            placeholder="hf_...",
+            value=os.getenv("HUGGINGFACEHUB_API_TOKEN", "")
+        )
+        model_input = gr.Textbox(
+            label="Model ID",
+            placeholder="e.g. meta-llama/Meta-Llama-3-8B-Instruct",
+            value=os.getenv("HF_MODEL_ID", HF_MODEL_ID),
+        )
+        file_input = gr.File(label="Upload PDF", file_types=[".pdf"], type="filepath")
+        init_btn = gr.Button("Initialize System")
+    status_output = gr.Textbox(label="Status", interactive=False)
+    chatbot = gr.ChatInterface(
+        fn=chat,
+        additional_inputs=[api_key_input, model_input, file_input]
+    )
+    init_btn.click(initialize_system, inputs=[api_key_input, model_input, file_input], outputs=[status_output])
 if __name__ == "__main__":
+    # Use local launch by default; share links can fail without network access.
+    demo.queue().launch(share=False)

data/README.md ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ This directory stores uploaded PDFs and the generated FAISS index at runtime.
2	+ These files are ignored in version control to keep the repo lightweight for GitHub and Hugging Face Spaces.

requirements.txt ADDED Viewed

	@@ -0,0 +1,13 @@

+langchain==0.3.7
+langchain-community==0.3.7
+langchain-text-splitters==0.3.2
+langchain-huggingface==0.1.2
+langgraph==0.2.39
+gradio==4.44.1
+python-dotenv==1.0.1
+sentence-transformers==2.6.1
+faiss-cpu==1.7.4
+pypdf==4.2.0
+pydantic==2.9.2
+huggingface-hub==0.23.4
+transformers>=4.37.0

src/__init__.py ADDED Viewed

File without changes

src/agent.py ADDED Viewed

	@@ -0,0 +1,240 @@

+from typing import List, Optional, TypedDict
+from types import SimpleNamespace
+import requests
+from langgraph.graph import StateGraph, END
+from langchain_core.messages import AIMessage, BaseMessage, HumanMessage
+from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
+from .config import HF_MODEL_ID, HF_API_TOKEN, LOCAL_MODEL_ID, TEMPERATURE
+# Cache local model/pipeline to avoid repeated downloads.
+_LOCAL_PIPELINE = None
+_LOCAL_MODEL_ID = None
+def _build_prompt(question: str, docs: List) -> str:
+    """Create a concise prompt that uses retrieved context."""
+    context = "\n\n".join(d.page_content for d in docs[:4])
+    return (
+        "You are a helpful assistant. Use the provided context to answer the question. "
+        "If the context is insufficient, say you do not know.\n\n"
+        f"Context:\n{context}\n\nQuestion: {question}\nAnswer:"
+    )
+class ChatState(TypedDict):
+    messages: List[BaseMessage]
+    context: str
+def _hf_generate(prompt: str, model_id: str, token: Optional[str], temperature: float) -> str:
+    """
+    Minimal text generation call against the Hugging Face router API.
+    """
+    url = f"https://router.huggingface.co/models/{model_id}"
+    headers = {"Accept": "application/json"}
+    if token:
+        headers["Authorization"] = f"Bearer {token}"
+    payload = {
+        "inputs": prompt,
+        "parameters": {
+            "max_new_tokens": 512,
+            "temperature": temperature,
+            "return_full_text": False,
+        },
+    }
+    try:
+        resp = requests.post(url, headers=headers, json=payload, timeout=60)
+        resp.raise_for_status()
+    except requests.HTTPError as http_err:
+        status = http_err.response.status_code if http_err.response is not None else None
+        if status == 404:
+            raise RuntimeError(
+                f"Model '{model_id}' not found on Hugging Face router. "
+                f"Set HF_MODEL_ID to a router-available text-generation model and retry."
+            ) from http_err
+        raise
+    except requests.RequestException as req_err:
+        # Network layer issues (timeouts, DNS, etc.) should surface cleanly so we can fall back.
+        raise RuntimeError(f"Hugging Face router request failed: {req_err}") from req_err
+    data = resp.json()
+    # HF router can return list or dict; handle both
+    if isinstance(data, list) and data and isinstance(data[0], dict):
+        if "generated_text" in data[0]:
+            return data[0]["generated_text"]
+        if "error" in data[0]:
+            raise RuntimeError(data[0]["error"])
+    if isinstance(data, dict):
+        if "generated_text" in data:
+            return data["generated_text"]
+        if "error" in data:
+            raise RuntimeError(data["error"])
+    return str(data)
+def _local_generate(prompt: str, model_id: str, temperature: float) -> str:
+    """
+    Fallback local generation using transformers pipeline (no HF API token needed).
+    Truncates the prompt to fit within the model's max position embeddings to avoid index errors.
+    """
+    global _LOCAL_PIPELINE, _LOCAL_MODEL_ID
+    if _LOCAL_PIPELINE is None or _LOCAL_MODEL_ID != model_id:
+        tokenizer = AutoTokenizer.from_pretrained(model_id)
+        model = AutoModelForCausalLM.from_pretrained(model_id)
+        _LOCAL_PIPELINE = pipeline(
+            "text-generation",
+            model=model,
+            tokenizer=tokenizer,
+            device_map="cpu",
+        )
+        _LOCAL_MODEL_ID = model_id
+    tokenizer = _LOCAL_PIPELINE.tokenizer
+    model = _LOCAL_PIPELINE.model
+    max_new_tokens = 128
+    # Determine max prompt length to prevent IndexError for small context windows (e.g., gpt2 = 1024).
+    max_positions = getattr(getattr(model, "config", None), "max_position_embeddings", None)
+    pad_token_id = tokenizer.eos_token_id or tokenizer.pad_token_id
+    if max_positions and isinstance(max_positions, int):
+        allowed = max_positions - max_new_tokens - 1
+        if allowed > 0:
+            input_ids = tokenizer.encode(prompt, add_special_tokens=False)
+            if len(input_ids) > allowed:
+                # Keep the tail of the prompt (most recent question + context)
+                input_ids = input_ids[-allowed:]
+                prompt = tokenizer.decode(input_ids, skip_special_tokens=True)
+    outputs = _LOCAL_PIPELINE(
+        prompt,
+        max_new_tokens=max_new_tokens,
+        do_sample=temperature > 0,
+        temperature=temperature,
+        pad_token_id=pad_token_id,
+    )
+    # transformers pipeline returns list of dicts
+    if outputs and isinstance(outputs[0], dict) and "generated_text" in outputs[0]:
+        return outputs[0]["generated_text"]
+    return str(outputs)
+def build_agent(
+    vectorstore,
+    hf_model_id: Optional[str] = None,
+    hf_api_token: Optional[str] = None,
+    temperature: Optional[float] = None,
+):
+    """
+    Simple RAG agent using Hugging Face router inference (text_generation).
+    """
+    retriever = vectorstore.as_retriever()
+    model_id = (hf_model_id or HF_MODEL_ID).strip()
+    local_model_id = (LOCAL_MODEL_ID or model_id).strip()
+    token = (hf_api_token or HF_API_TOKEN or "").strip() or None
+    temp = TEMPERATURE if temperature is None else temperature
+    def invoke(payload):
+        messages = payload.get("messages", [])
+        user_content = messages[-1].content if messages else ""
+        # prefer invoke to avoid deprecation warnings
+        if hasattr(retriever, "invoke"):
+            docs = retriever.invoke(user_content)
+        else:
+            docs = retriever.get_relevant_documents(user_content)
+        prompt = _build_prompt(user_content, docs)
+        # Use router if a token is provided; otherwise fall back to local generation.
+        try:
+            if token:
+                text = _hf_generate(prompt, model_id=model_id, token=token, temperature=temp)
+            else:
+                text = _local_generate(prompt, model_id=local_model_id, temperature=temp)
+        except Exception as api_err:
+            if token:
+                # Degrade gracefully to local generation when router is flaky or the model is blocked.
+                fallback_note = (
+                    f"[Fallback to local model '{local_model_id}' because HF router failed: {api_err}]"
+                )
+                print(fallback_note)
+                text = _local_generate(prompt, model_id=local_model_id, temperature=temp)
+                text = f"{text}\n\n{fallback_note}"
+            else:
+                raise
+        return {"messages": [AIMessage(content=text)]}
+    # Return an object with an invoke method to mirror previous agent_executor shape
+    return SimpleNamespace(invoke=invoke)
+def build_langgraph_agent(
+    vectorstore,
+    hf_model_id: Optional[str] = None,
+    hf_api_token: Optional[str] = None,
+    temperature: Optional[float] = None,
+):
+    """
+    LangGraph-based RAG agent with retrieval + generation nodes.
+    """
+    retriever = vectorstore.as_retriever()
+    model_id = (hf_model_id or HF_MODEL_ID).strip()
+    local_model_id = (LOCAL_MODEL_ID or model_id).strip()
+    token = (hf_api_token or HF_API_TOKEN or "").strip() or None
+    temp = TEMPERATURE if temperature is None else temperature
+    def retrieve_node(state: ChatState):
+        messages = state.get("messages", [])
+        user_msg = next((m for m in reversed(messages) if isinstance(m, HumanMessage)), None)
+        query = user_msg.content if user_msg else ""
+        if hasattr(retriever, "invoke"):
+            docs = retriever.invoke(query)
+        else:
+            docs = retriever.get_relevant_documents(query)
+        context = "\n\n".join(d.page_content for d in docs[:4])
+        return {"context": context}
+    def generate_node(state: ChatState):
+        messages = state.get("messages", [])
+        context = state.get("context", "")
+        user_msg = next((m for m in reversed(messages) if isinstance(m, HumanMessage)), None)
+        question = user_msg.content if user_msg else ""
+        prompt = (
+            "You are a helpful assistant. Use the provided context to answer the question. "
+            "If the context is insufficient, say you do not know.\n\n"
+            f"Context:\n{context}\n\nQuestion: {question}\nAnswer:"
+        )
+        try:
+            if token:
+                text = _hf_generate(prompt, model_id=model_id, token=token, temperature=temp)
+            else:
+                text = _local_generate(prompt, model_id=local_model_id, temperature=temp)
+        except Exception as api_err:
+            if token:
+                fallback_note = (
+                    f"[Fallback to local model '{local_model_id}' because HF router failed: {api_err}]"
+                )
+                print(fallback_note)
+                text = _local_generate(prompt, model_id=local_model_id, temperature=temp)
+                text = f"{text}\n\n{fallback_note}"
+            else:
+                raise
+        return {"messages": messages + [AIMessage(content=text)]}
+    graph = StateGraph(ChatState)
+    graph.add_node("retrieve", retrieve_node)
+    graph.add_node("generate", generate_node)
+    graph.set_entry_point("retrieve")
+    graph.add_edge("retrieve", "generate")
+    graph.add_edge("generate", END)
+    app = graph.compile()
+    # Wrap to mirror the previous agent_executor interface for Gradio.
+    def invoke(payload):
+        incoming_messages = payload.get("messages", [])
+        initial_state: ChatState = {"messages": incoming_messages, "context": ""}
+        return app.invoke(initial_state)
+    return SimpleNamespace(invoke=invoke)

src/config.py ADDED Viewed

	@@ -0,0 +1,27 @@

+import os
+from pathlib import Path
+from dotenv import load_dotenv
+load_dotenv()
+# Base Paths
+BASE_DIR = Path(__file__).resolve().parent.parent
+DATA_DIR = BASE_DIR / "data"
+SRC_DIR = BASE_DIR / "src"
+# Data Paths
+PDF_PATH = DATA_DIR / "source.pdf"  # We will rename the input PDF to this
+VECTORSTORE_PATH = DATA_DIR / "faiss_index"
+# RAG Parameters
+CHUNK_SIZE = 1000
+CHUNK_OVERLAP = 200
+EMBEDDING_MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"
+# LLM Parameters (Hugging Face free Inference API)
+# Default router model should exist on the router. Override via HF_MODEL_ID env var or UI input.
+# Meta Llama 3 8B Instruct is widely available on the HF router as of Nov 2024.
+HF_MODEL_ID = os.getenv("HF_MODEL_ID", "meta-llama/Meta-Llama-3-8B-Instruct")
+HF_API_TOKEN = os.getenv("HUGGINGFACEHUB_API_TOKEN", "")  # Optional for many free endpoints
+LOCAL_MODEL_ID = os.getenv("LOCAL_MODEL_ID", "distilgpt2")
+TEMPERATURE = float(os.getenv("HF_TEMPERATURE", "0.3"))

src/ingestion.py ADDED Viewed

	@@ -0,0 +1,33 @@

+from langchain_community.document_loaders import PyPDFLoader
+from langchain_text_splitters import RecursiveCharacterTextSplitter
+from .config import CHUNK_SIZE, CHUNK_OVERLAP
+def load_pdf(file_path):
+    """
+    Loads a PDF file and returns a list of documents.
+    """
+    loader = PyPDFLoader(file_path)
+    documents = loader.load()
+    return documents
+def chunk_documents(documents):
+    """
+    Splits documents into smaller chunks.
+    """
+    text_splitter = RecursiveCharacterTextSplitter(
+        chunk_size=CHUNK_SIZE,
+        chunk_overlap=CHUNK_OVERLAP,
+        length_function=len,
+        is_separator_regex=False,
+    )
+    chunks = text_splitter.split_documents(documents)
+    return chunks
+def ingest_file(file_path):
+    """
+    Orchestrates loading and chunking.
+    """
+    docs = load_pdf(file_path)
+    chunks = chunk_documents(docs)
+    print(f"Loaded {len(docs)} pages and created {len(chunks)} chunks.")
+    return chunks

src/rag_tool.py ADDED Viewed

	@@ -0,0 +1,20 @@

+from langchain_core.tools import tool
+def get_retriever_tool(vectorstore):
+    """
+    Creates a LangChain tool from the vector store retriever.
+    """
+    retriever = vectorstore.as_retriever()
+    @tool
+    def retrieve_rag_docs(query: str) -> str:
+        """Search and retrieve information about the RAG Chatbot and LangGraph Agent project from the knowledge base."""
+        # Use invoke if available, else get_relevant_documents
+        if hasattr(retriever, "invoke"):
+            docs = retriever.invoke(query)
+        else:
+            docs = retriever.get_relevant_documents(query)
+        return "\n\n".join([d.page_content for d in docs])
+    return retrieve_rag_docs

src/vectorstore.py ADDED Viewed

	@@ -0,0 +1,33 @@

+import os
+from langchain_community.vectorstores import FAISS
+try:
+    # Preferred newer package
+    from langchain_huggingface import HuggingFaceEmbeddings
+except ImportError:
+    # Fallback to older location if extra package is missing
+    from langchain_community.embeddings import HuggingFaceEmbeddings
+from .config import EMBEDDING_MODEL_NAME, VECTORSTORE_PATH
+def get_embeddings():
+    """
+    Initializes the embedding model.
+    """
+    return HuggingFaceEmbeddings(model_name=EMBEDDING_MODEL_NAME)
+def create_vectorstore(chunks):
+    """
+    Creates a FAISS vector store from chunks and saves it locally.
+    """
+    embeddings = get_embeddings()
+    vectorstore = FAISS.from_documents(chunks, embeddings)
+    vectorstore.save_local(str(VECTORSTORE_PATH))
+    return vectorstore
+def load_vectorstore():
+    """
+    Loads the FAISS vector store from disk.
+    """
+    embeddings = get_embeddings()
+    if os.path.exists(VECTORSTORE_PATH):
+        return FAISS.load_local(str(VECTORSTORE_PATH), embeddings, allow_dangerous_deserialization=True)
+    return None

tests/test_pipeline.py ADDED Viewed

	@@ -0,0 +1,48 @@

+import sys
+import os
+from pathlib import Path
+# Add project root to sys.path
+sys.path.append(str(Path(__file__).resolve().parent.parent))
+from src.ingestion import load_pdf, chunk_documents
+from src.vectorstore import create_vectorstore, load_vectorstore
+from src.config import PDF_PATH
+def test_ingestion():
+    print("Testing Ingestion...")
+    if not os.path.exists(PDF_PATH):
+        print(f"Skipping ingestion test: {PDF_PATH} not found.")
+        return
+    docs = load_pdf(str(PDF_PATH))
+    assert len(docs) > 0, "No documents loaded"
+    print(f"Loaded {len(docs)} pages.")
+    chunks = chunk_documents(docs)
+    assert len(chunks) > 0, "No chunks created"
+    print(f"Created {len(chunks)} chunks.")
+    return chunks
+def test_vectorstore(chunks):
+    print("Testing Vector Store...")
+    if not chunks:
+        print("Skipping vector store test: No chunks.")
+        return
+    vs = create_vectorstore(chunks)
+    assert vs is not None, "Vector store creation failed"
+    print("Vector store created and saved.")
+    loaded_vs = load_vectorstore()
+    assert loaded_vs is not None, "Vector store loading failed"
+    print("Vector store loaded successfully.")
+if __name__ == "__main__":
+    try:
+        chunks = test_ingestion()
+        test_vectorstore(chunks)
+        print("All tests passed!")
+    except Exception as e:
+        print(f"Test failed: {e}")
+        sys.exit(1)