Spaces:

rashid996958
/

AnyRAG-WebSearch

Sleeping

App Files Files Community

AnyRAG-WebSearch / README.md

Rashid Ali

configure setup

0ac4aed 7 months ago

preview code

raw

history blame contribute delete

6.05 kB

metadata

title: AnyRAG
emoji: 🚀
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
  - streamlit
pinned: false
short_description: Streamlit template space

🧠 AnyRAG — Intelligent Document & Web-Aware Chatbot

AnyRAG (short for Any Document Retrieval-Augmented Generation) is an advanced RAG-powered conversational system built with LangGraph, Streamlit, and OpenAI / HuggingFace models. It allows users to interact with uploaded documents and optionally fetch live web data for up-to-date, context-rich responses.

🚀 Features

✅ Dual Model Support — Choose between:

OpenAI GPT models (via API)

Open-Source HuggingFace models (like Llama, Mistral, etc.)

✅ Document-Aware Chat — Upload PDFs or text documents, and chat directly with their contents.

✅ Dynamic Web Search Integration — Toggle on/off real-time web search (DuckDuckGo) for the latest information.

✅ System Prompt Control — Customize how the AI behaves (e.g., “Act as a Tax Advisor” or “Explain as a Data Scientist”).

✅ Conversation Memory — Stores multiple chat threads with custom titles, allowing you to switch between them seamlessly.

✅ Vector Store Management — Automatically indexes your documents for efficient semantic retrieval using FAISS/Chroma.

✅ User-Friendly Streamlit Frontend — Clean, responsive UI with sidebars for all controls.

✅ Graph-based Orchestration — Uses LangGraph to manage chat states, document search, and tool invocation.

🧩 Architecture Overview

         ┌────────────────────────┐
         │      Streamlit UI      │
         │ (chatbot_frontend.py)  │
         └──────────┬─────────────┘
                    │
                    ▼
         ┌────────────────────────┐
         │  LangGraph Backend     │
         │ (chatbot_backend.py)   │
         │  • Chat State Engine   │
         │  • Document Retriever  │
         │  • Tool Invoker        │
         └──────────┬─────────────┘
                    │
          ┌─────────┴─────────┐
          ▼                   ▼
   ┌──────────────┐    ┌──────────────┐
   │ Vector Store │    │ Web Search   │
   │ (FAISS/Chroma│    │ (DuckDuckGo) │
   └──────────────┘    └──────────────┘

🏗️ Major Components

1. chatbot_frontend.py

The Streamlit-based interface that enables:

Chat interaction

Model selection (OpenAI or HuggingFace)

Document upload and retrieval

Web search toggle

System prompt customization

Viewing & switching saved chat threads

2. chatbot_backend.py

The main orchestration logic powered by LangGraph:

Maintains conversation state

Handles message routing between human, LLM, and tools

Integrates document retriever, DuckDuckGo search tool, and vector store

Uses Graph state transitions for structured, multi-step reasoning

3. search_tool.py

Implements live web search using DuckDuckGo, returning relevant snippets when web search is enabled.

4. document_tool.py

Handles:

PDF or text ingestion

Chunking and embedding creation

Building or loading a FAISS/Chroma vector index

Performing semantic similarity search

5. models.py

Provides model initialization functions for:

OpenAI GPT models via API key

HuggingFace Transformers (running locally or via pipeline)

6. vector_stores/

Stores prebuilt FAISS/Chroma indexes for quick retrieval of previously uploaded or sample documents.

⚙️ Setup Instructions 1️⃣ Clone the Repository

git clone https://github.com/yourusername/AnyRAG.git
cd AnyRAG

2️⃣ Create and Activate Environment

Using uv (recommended):

uv venv
source .venv/bin/activate

Or using venv/conda:

python -m venv .venv
source .venv/bin/activate

3️⃣ Install Dependencies

pip install -r requirements.txt

4️⃣ Run the Streamlit App streamlit run chatbot_frontend.py

🧠 Example Use Cases

Domain Example System Prompt Benefit 🧾 Tax Advisor “You are an expert GST consultant in India.” Answer document + live tax queries.

⚖️ Legal Assistant “You are a contract lawyer explaining clauses.” Summarize long PDFs with context.

📚 Research Helper “You are a literature review assistant.” Summarize papers, add web references.

💬 General Chat (Leave blank) Use model’s default persona.

🧰 Folder Structure

AnyRAG/
├── chatbot_backend.py        # LangGraph backend orchestrator
├── chatbot_frontend.py       # Streamlit-based UI
├── search_tool.py            # DuckDuckGo search integration
├── document_tool.py          # Document ingestion and vector store
├── models.py                 # Model loaders (OpenAI / HF)
├── vector_stores/            # Prebuilt FAISS indexes
├── sample_docs/              # Example documents for testing
├── requirements.txt
└── README.md

🧩 Benefits of AnyRAG

✅ Hybrid Knowledge Access — Combines local document understanding with live internet results.

✅ Flexible AI Backbone — Works with both OpenAI and HuggingFace models.

✅ Explainable Retrieval Flow — LangGraph structure shows every reasoning step.

✅ Customizable Persona — Behavior control via System Prompt.

✅ Lightweight Deployment — Works locally or via Docker / Hugging Face Spaces.

✅ Scalable for Enterprises — Extend easily for multi-agent or multi-doc setups.

🧠 Future Enhancements

🧩 Add Multi-Document RAG

🧠 Integrate Memory Store for Persistent Context

📈 Add Analytics for Query Insights

🪄 Add Voice/Whisper Support for Audio Inputs

🌍 Expose REST API endpoint for external use