Spaces:
Sleeping
title: AnyRAG
emoji: π
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: Streamlit template space
π§ AnyRAG β Intelligent Document & Web-Aware Chatbot
AnyRAG (short for Any Document Retrieval-Augmented Generation) is an advanced RAG-powered conversational system built with LangGraph, Streamlit, and OpenAI / HuggingFace models. It allows users to interact with uploaded documents and optionally fetch live web data for up-to-date, context-rich responses.
π Features
β Dual Model Support β Choose between:
OpenAI GPT models (via API)
Open-Source HuggingFace models (like Llama, Mistral, etc.)
β Document-Aware Chat β Upload PDFs or text documents, and chat directly with their contents.
β Dynamic Web Search Integration β Toggle on/off real-time web search (DuckDuckGo) for the latest information.
β System Prompt Control β Customize how the AI behaves (e.g., βAct as a Tax Advisorβ or βExplain as a Data Scientistβ).
β Conversation Memory β Stores multiple chat threads with custom titles, allowing you to switch between them seamlessly.
β Vector Store Management β Automatically indexes your documents for efficient semantic retrieval using FAISS/Chroma.
β User-Friendly Streamlit Frontend β Clean, responsive UI with sidebars for all controls.
β Graph-based Orchestration β Uses LangGraph to manage chat states, document search, and tool invocation.
π§© Architecture Overview
ββββββββββββββββββββββββββ
β Streamlit UI β
β (chatbot_frontend.py) β
ββββββββββββ¬ββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββ
β LangGraph Backend β
β (chatbot_backend.py) β
β β’ Chat State Engine β
β β’ Document Retriever β
β β’ Tool Invoker β
ββββββββββββ¬ββββββββββββββ
β
βββββββββββ΄ββββββββββ
βΌ βΌ
ββββββββββββββββ ββββββββββββββββ
β Vector Store β β Web Search β
β (FAISS/Chromaβ β (DuckDuckGo) β
ββββββββββββββββ ββββββββββββββββ
ποΈ Major Components
1. chatbot_frontend.py
The Streamlit-based interface that enables:
Chat interaction
Model selection (OpenAI or HuggingFace)
Document upload and retrieval
Web search toggle
System prompt customization
Viewing & switching saved chat threads
2. chatbot_backend.py
The main orchestration logic powered by LangGraph:
Maintains conversation state
Handles message routing between human, LLM, and tools
Integrates document retriever, DuckDuckGo search tool, and vector store
Uses Graph state transitions for structured, multi-step reasoning
3. search_tool.py
Implements live web search using DuckDuckGo, returning relevant snippets when web search is enabled.
4. document_tool.py
Handles:
PDF or text ingestion
Chunking and embedding creation
Building or loading a FAISS/Chroma vector index
Performing semantic similarity search
5. models.py
Provides model initialization functions for:
OpenAI GPT models via API key
HuggingFace Transformers (running locally or via pipeline)
6. vector_stores/
Stores prebuilt FAISS/Chroma indexes for quick retrieval of previously uploaded or sample documents.
βοΈ Setup Instructions 1οΈβ£ Clone the Repository
git clone https://github.com/yourusername/AnyRAG.git
cd AnyRAG
2οΈβ£ Create and Activate Environment
Using uv (recommended):
uv venv
source .venv/bin/activate
Or using venv/conda:
python -m venv .venv
source .venv/bin/activate
3οΈβ£ Install Dependencies
pip install -r requirements.txt
4οΈβ£ Run the Streamlit App streamlit run chatbot_frontend.py
π§ Example Use Cases
Domain Example System Prompt Benefit π§Ύ Tax Advisor βYou are an expert GST consultant in India.β Answer document + live tax queries.
βοΈ Legal Assistant βYou are a contract lawyer explaining clauses.β Summarize long PDFs with context.
π Research Helper βYou are a literature review assistant.β Summarize papers, add web references.
π¬ General Chat (Leave blank) Use modelβs default persona.
π§° Folder Structure
AnyRAG/
βββ chatbot_backend.py # LangGraph backend orchestrator
βββ chatbot_frontend.py # Streamlit-based UI
βββ search_tool.py # DuckDuckGo search integration
βββ document_tool.py # Document ingestion and vector store
βββ models.py # Model loaders (OpenAI / HF)
βββ vector_stores/ # Prebuilt FAISS indexes
βββ sample_docs/ # Example documents for testing
βββ requirements.txt
βββ README.md
π§© Benefits of AnyRAG
β Hybrid Knowledge Access β Combines local document understanding with live internet results.
β Flexible AI Backbone β Works with both OpenAI and HuggingFace models.
β Explainable Retrieval Flow β LangGraph structure shows every reasoning step.
β Customizable Persona β Behavior control via System Prompt.
β Lightweight Deployment β Works locally or via Docker / Hugging Face Spaces.
β Scalable for Enterprises β Extend easily for multi-agent or multi-doc setups.
π§ Future Enhancements
π§© Add Multi-Document RAG
π§ Integrate Memory Store for Persistent Context
π Add Analytics for Query Insights
πͺ Add Voice/Whisper Support for Audio Inputs
π Expose REST API endpoint for external use