--- title: AtlasRAG Backend emoji: ๐Ÿ“š colorFrom: blue colorTo: indigo sdk: docker app_port: 7860 pinned: false license: mit --- # AtlasRAG **Hybrid Graph-Augmented Retrieval-Augmented Generation System** AtlasRAG is a production-ready document summarization and question-answering system that combines vector search, graph-based reasoning, and LLM-based generation to enable grounded, citation-aware responses over uploaded documents. The system goes beyond naive vector similarity by incorporating concept co-occurrence graphs, enabling improved contextual coverage for complex, multi-section queries. ![AtlasRAG Web Interface](https://drive.google.com/uc?id=1BIfz53BOlS5W9LmHc66sBGyZLO9tg83j) **[Live Demo โ†’](https://atlas-rag.vercel.app/)** --- ## โœจ Features - ๐Ÿ“„ **PDF Upload & Ingestion** โ€“ Seamless document processing - ๐Ÿง  **Hybrid Retrieval Pipeline** - Dense vector similarity search - BM25 keyword search - Concept co-occurrence graph expansion - ๐Ÿ’ฌ **Unified Chat Interface** โ€“ Question answering and full-document summarization - ๐Ÿ“š **Citation-Aware Responses** โ€“ Grounded answers with source attribution - ๐Ÿงฉ **Conversation Memory** โ€“ Short-term context retention across turns - โœ๏ธ **Query Rewriting** โ€“ Context-aware reformulation using chat history - โšก **Token Limit Protection** โ€“ Automatic document size validation to prevent API errors - ๐Ÿ” **Evaluation Framework** โ€“ Built-in retrieval quality assessment - ๐Ÿงช **Ablation Studies** โ€“ Baseline comparisons and performance validation --- ## ๐Ÿ—๏ธ System Architecture ``` PDF Document โ†“ Chunking & Parsing โ†“ Embeddings Generation โ†’ Vector Index โ†“ Concept Extraction โ†’ Co-occurrence Graph โ†“ Hybrid Graph-RAG Retrieval โ†“ Context Assembly & Prompt Construction โ†“ LLM Generation โ†“ Answer + Citations ``` --- ## ๐Ÿ” Retrieval Strategy AtlasRAG employs a three-stage hybrid retrieval pipeline: ### 1. Vector Search Dense embeddings using sentence transformers for semantic similarity. ### 2. Lexical Search BM25 scoring for keyword-based anchoring and exact term matching. ### 3. Graph Expansion - **Nodes:** Extracted concepts from document chunks - **Edges:** Co-occurrence relationships within the corpus - **Purpose:** Expand retrieval to conceptually related sections The graph augments (rather than replaces) traditional vector retrieval, providing structural context for multi-hop queries. --- ## ๐Ÿ“Š Evaluation ### Evaluation Corpus All evaluations were conducted using: **"Attention Is All You Need"** by Vaswani et al. **Rationale:** - Dense conceptual structure with cross-section dependencies - Well-defined technical terminology - Requires multi-hop reasoning for comprehensive answers - Reflects real-world academic document QA scenarios ### Query Types The evaluation suite includes manually designed queries mapped to expected document pages: - **Localized queries** โ€“ Single-concept retrieval *Example: "What is scaled dot-product attention?"* - **Distributed queries** โ€“ Multi-section synthesis *Example: "How does self-attention replace recurrence and convolution?"* - **Comparative queries** โ€“ Cross-concept analysis *Example: "Compare encoder, decoder, and encoder-decoder architectures"* ### Metrics - **Recall@5** โ€“ Percentage of queries with at least one relevant page retrieved - **Coverage** โ€“ Number of unique relevant pages retrieved - **Diversity** โ€“ Fraction of unique pages in the retrieved set *Note: Precision was intentionally de-emphasized due to small K values and page-level evaluation granularity.* --- ## ๐Ÿ“ˆ Results ### Baseline Comparison: Vector Search vs. Hybrid Graph-RAG **Key Findings:** - **Recall@5 = 1.00** across all evaluated queries for both methods - Both approaches reliably retrieve relevant information - **Coverage & Diversity** - Comparable performance between vector-only and hybrid retrieval - Hybrid Graph-RAG occasionally surfaces conceptually adjacent sections - No degradation introduced by graph expansion **Interpretation:** The graph component does not harm retrieval quality and provides a structural foundation for improvements on larger, more fragmented corpora. ### Ablation Study Isolated evaluation of graph reasoning impact: - **Vector Only** - **Vector + Graph Expansion** **Results:** - Recall, coverage, and diversity remained stable across configurations - Graph augmentation introduces no noise or degradation - Validates the architectural safety of hybrid approach for production use --- ## ๐Ÿง  Conversation Memory & Query Rewriting - **Short-term memory** maintains recent conversation turns - **Context-aware rewriting** reformulates follow-up queries using chat history - Enables natural conversational flow without polluting the retrieval pipeline --- ## ๐Ÿ› ๏ธ Tech Stack ### Backend - FastAPI - LangChain (optional integration) - Qdrant / Vector Store - NetworkX (graph reasoning) - Sentence Transformers - Groq / OpenAI-compatible LLM APIs ### Frontend - Next.js - Modern chat-style UI - PDF upload interface ### Development & Deployment - Ruff (formatting & linting) - Pre-commit hooks - Docker - Hugging Face Spaces (backend) - Vercel (frontend) --- ## ๐Ÿš€ Getting Started ### Prerequisites - Python 3.9+ - Node.js 18+ - Git ### Clone Repository ```bash git clone https://github.com/sanskarmodi8/Atlas-RAG cd Atlas-RAG ``` ### Backend Setup ```bash cd backend python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate pip install -r requirements.txt pip install -e . uvicorn app.main:app --reload ``` Backend runs at: **http://127.0.0.1:8000** ### Frontend Setup ```bash cd frontend npm install npm run dev ``` Frontend runs at: **http://localhost:3000** --- ## ๐Ÿงน Code Quality This project enforces strict code quality standards. ### Install Pre-commit Hooks ```bash pre-commit install ``` ### Format & Lint ```bash ruff check . ruff format . ``` All code complies with: - Ruff linting rules - Black-style formatting - Pre-commit validation --- ## ๐ŸŒ Deployment ### Production Instances - **Frontend:** [https://atlas-rag.vercel.app/](https://atlas-rag.vercel.app/) *Deployed on Vercel* - **Backend API:** [https://sanskarmodi-atlasrag-backend.hf.space/](https://sanskarmodi-atlasrag-backend.hf.space/) *Deployed on Hugging Face Spaces* Binary document files are excluded from version control and handled at runtime. --- ## ๐Ÿ“„ License This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for details. --- ## ๐Ÿ‘ค Author **Sanskar Modi** GitHub: [@sanskarmodi8](https://github.com/sanskarmodi8) --- ## ๐Ÿค Contributing Contributions are welcome! Please feel free to submit a Pull Request. --- ## ๐Ÿ“ง Contact For questions or feedback, please open an issue on GitHub.