SanskarModi commited on
Commit
cc246c6
·
1 Parent(s): 0e9b9ae

added workflow

Browse files
.dockerignore ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __pycache__/
2
+ *.pyc
3
+ *.pyo
4
+ *.pyd
5
+ .Python
6
+ *.so
7
+ *.egg
8
+ *.egg-info/
9
+ dist/
10
+ build/
11
+ .env
12
+ .venv
13
+ env/
14
+ venv/
15
+ .git/
16
+ .gitignore
17
+ .vscode/
18
+ .idea/
19
+ *.log
20
+ .DS_Store
21
+ node_modules/
22
+ frontend/
23
+ *.md
24
+ !README.md
25
+ .pre-commit-config.yaml
26
+ .editorconfig
27
+ .ruff.toml
28
+ backend/storage/
.github/workflows/deploy.yaml ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Deploy to HuggingFace Space
2
+
3
+ on:
4
+ push:
5
+ branches:
6
+ - main
7
+
8
+ jobs:
9
+ deploy:
10
+ runs-on: ubuntu-latest
11
+
12
+ steps:
13
+ - name: Checkout Full History
14
+ uses: actions/checkout@v3
15
+ with:
16
+ fetch-depth: 0
17
+
18
+ - name: Set up Git LFS
19
+ run: |
20
+ git lfs install
21
+
22
+ - name: Push to HuggingFace Space
23
+ run: |
24
+ git config --global user.name "github-actions"
25
+ git config --global user.email "actions@github.com"
26
+ git remote add space https://SanskarModi:${{ secrets.HF_TOKEN }}@huggingface.co/spaces/SanskarModi/atlasrag-backend
27
+ git push space main --force
Dockerfile ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Use Python 3.10 slim image
2
+ FROM python:3.10-slim
3
+
4
+ # Set working directory
5
+ WORKDIR /app
6
+
7
+ # Install system dependencies
8
+ RUN apt-get update && apt-get install -y \
9
+ build-essential \
10
+ && rm -rf /var/lib/apt/lists/*
11
+
12
+ # Copy requirements first (for better caching)
13
+ COPY requirements.txt .
14
+
15
+ # Install Python dependencies
16
+ RUN pip install --no-cache-dir -r requirements.txt
17
+
18
+ # Download spaCy model
19
+ RUN python -m spacy download en_core_web_sm
20
+
21
+ # Copy the entire backend directory
22
+ COPY backend/ ./backend/
23
+
24
+ # Create necessary directories for storage
25
+ RUN mkdir -p /data/qdrant /data/docs /data/uploads
26
+
27
+ # Set environment variables
28
+ ENV PYTHONPATH=/app
29
+ ENV QDRANT_PATH=/data/qdrant
30
+
31
+ # Expose port 7860 (required by Hugging Face)
32
+ EXPOSE 7860
33
+
34
+ # Health check
35
+ HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
36
+ CMD python -c "import requests; requests.get('http://localhost:7860/docs')"
37
+
38
+ # Run the application
39
+ CMD ["uvicorn", "backend.app.main:app", "--host", "0.0.0.0", "--port", "7860"]
README.md CHANGED
@@ -1,127 +1,10 @@
1
- # AtlasRAG – Multi-Document Research & Reasoning Engine
2
-
3
- *A production-style RAG system powered by Hybrid Retrieval, Graph-RAG, Cross-Encoder Reranking, and structured citations.*
4
-
5
  ---
6
-
7
- ## 🚀 Overview
8
-
9
- AtlasRAG is an advanced Retrieval-Augmented Generation (RAG) engine designed to answer questions across multiple PDFs with high accuracy and page-level citations. It integrates modern retrieval techniques used by AI search products (Perplexity, Vectara, LlamaIndex) and exposes a clean API + minimal frontend interface.
10
-
11
- This README will expand into full documentation once the project is completed.
12
-
13
- ---
14
-
15
- ## ✨ Planned Feature Set
16
-
17
- ### 🔍 Retrieval Engine
18
-
19
- * Hierarchical section-aware chunking
20
- * Hybrid retrieval (BM25 + dense vectors)
21
- * Cross-encoder reranking
22
- * Query rewriting for conversational context
23
- * Graph-RAG reasoning (entity graph traversal)
24
- * Multi-document support
25
- * Structured citations (doc, pages, snippet)
26
-
27
- ### 📊 Evaluation
28
-
29
- * Synthetic QA generation
30
- * RAG metrics (context precision, answer relevance, faithfulness)
31
- * Benchmark variants:
32
-
33
- * vector only
34
- * hybrid
35
- * graph-rag
36
- * reranker enabled
37
-
38
- ### ⚙️ Architecture
39
-
40
- * FastAPI backend
41
- * Next.js frontend
42
- * Qdrant vector database or Chroma
43
- * NetworkX knowledge graph
44
- * LLM backend abstraction
45
- * Full modular structure for research + production use
46
-
47
- ---
48
-
49
- ## 📂 Project Structure (Initial)
50
-
51
- ```
52
- backend/
53
- app/
54
- main.py # FastAPI entrypoint
55
- config.py # Settings / env
56
- core/ # LLM abstraction, prompts
57
- models/ # Pydantic schemas
58
- ingestion/ # PDF → text → chunks → entities → index
59
- retrieval/ # Vector / BM25 / hybrid / graph-rag / reranker
60
- evaluation/ # Ragas / DeepEval evaluation pipeline
61
- api/ # HTTP routes
62
- utils/ # Helpers, logging
63
- frontend/
64
- pages/ # Upload dashboard, chat UI
65
- components/
66
- public/
67
- docs/
68
- ARCHITECTURE.md # Detailed system design
69
- EVALUATION.md # Benchmark results
70
- diagrams/
71
- requirements.txt
72
- LICENSE
73
- README.md
74
- ```
75
-
76
- ---
77
-
78
- ## 🧠 High-Level Architecture (Text Diagram)
79
-
80
- ```
81
- User → Next.js UI → FastAPI backend → Retrieval Engine
82
-
83
- Qdrant / Chroma Vector DB
84
-
85
- BM25 Keyword Index
86
-
87
- Knowledge Graph (NetworkX)
88
-
89
- LLM
90
- ```
91
-
92
- ---
93
-
94
- ## 🏁 Getting Started (Development)
95
-
96
- ### 1. Create Python environment
97
-
98
- ```
99
- python3 -m venv .venv
100
- source .venv/bin/activate
101
- pip install -r requirements.txt
102
- ```
103
-
104
- ### 2. Install frontend deps (later)
105
-
106
- ```
107
- cd frontend
108
- npm install
109
- ```
110
-
111
- ### 3. Run backend
112
-
113
- ```
114
- uvicorn app.main:app --reload
115
- ```
116
-
117
- ### 4. Run frontend
118
-
119
- ```
120
- npm run dev
121
- ```
122
-
123
- ---
124
-
125
- ## 📘 License
126
-
127
- [MIT License](LICENSE)
 
 
 
 
 
1
  ---
2
+ title: AtlasRAG Backend
3
+ emoji: 📚
4
+ colorFrom: blue
5
+ colorTo: indigo
6
+ sdk: docker
7
+ app_port: 7860
8
+ pinned: false
9
+ license: mit
10
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
backend/app/api/routes_chat.py CHANGED
@@ -1,15 +1,16 @@
1
  """Chat routes for QA and summarization."""
2
 
3
- from app.core.llm import llm_chat
4
- from app.core.prompts import build_rag_prompt, build_summary_prompt
5
- from app.memory.conversation import conversation_memory
6
- from app.memory.query_rewriter import rewrite_query
7
- from app.models.api import ChatRequest, ChatResponse
8
- from app.retrieval.chunk_registry import get_chunks
9
- from app.retrieval.citation_filter import filter_citations
10
- from app.retrieval.retrieve import hybrid_graph_search
11
  from fastapi import APIRouter
12
 
 
 
 
 
 
 
 
 
 
13
  router = APIRouter()
14
 
15
 
@@ -28,6 +29,16 @@ def chat(request: ChatRequest) -> ChatResponse:
28
  citations=[],
29
  )
30
 
 
 
 
 
 
 
 
 
 
 
31
  context = "\n\n".join(chunk.text for chunk in chunks)
32
  messages = build_summary_prompt(context)
33
 
@@ -56,6 +67,10 @@ def chat(request: ChatRequest) -> ChatResponse:
56
  # 3. Retrieve documents
57
  results = hybrid_graph_search(rewritten_query, request.top_k)
58
 
 
 
 
 
59
  if not results:
60
  return ChatResponse(
61
  answer="I don't know based on the provided documents.",
@@ -86,3 +101,10 @@ def chat(request: ChatRequest) -> ChatResponse:
86
  answer=answer,
87
  citations=citations,
88
  )
 
 
 
 
 
 
 
 
1
  """Chat routes for QA and summarization."""
2
 
 
 
 
 
 
 
 
 
3
  from fastapi import APIRouter
4
 
5
+ from backend.app.core.llm import llm_chat
6
+ from backend.app.core.prompts import build_rag_prompt, build_summary_prompt
7
+ from backend.app.memory.conversation import conversation_memory
8
+ from backend.app.memory.query_rewriter import rewrite_query
9
+ from backend.app.models.api import ChatRequest, ChatResponse
10
+ from backend.app.retrieval.chunk_registry import get_chunks
11
+ from backend.app.retrieval.citation_filter import filter_citations
12
+ from backend.app.retrieval.retrieve import hybrid_graph_search
13
+
14
  router = APIRouter()
15
 
16
 
 
29
  citations=[],
30
  )
31
 
32
+ # Filter chunks by selected doc_ids if provided
33
+ if request.doc_ids:
34
+ chunks = [chunk for chunk in chunks if chunk.doc_id in request.doc_ids]
35
+
36
+ if not chunks:
37
+ return ChatResponse(
38
+ answer="No content found for the selected documents.",
39
+ citations=[],
40
+ )
41
+
42
  context = "\n\n".join(chunk.text for chunk in chunks)
43
  messages = build_summary_prompt(context)
44
 
 
67
  # 3. Retrieve documents
68
  results = hybrid_graph_search(rewritten_query, request.top_k)
69
 
70
+ # Filter results by selected doc_ids if provided
71
+ if request.doc_ids:
72
+ results = [r for r in results if r.chunk.doc_id in request.doc_ids]
73
+
74
  if not results:
75
  return ChatResponse(
76
  answer="I don't know based on the provided documents.",
 
101
  answer=answer,
102
  citations=citations,
103
  )
104
+
105
+
106
+ @router.post("/clear")
107
+ def clear_conversation(session_id: str = "default") -> dict:
108
+ """Clear conversation history for a session."""
109
+ conversation_memory.clear(session_id)
110
+ return {"status": "success", "message": "Conversation cleared"}
backend/app/api/routes_chat_langchain.py CHANGED
@@ -1,14 +1,15 @@
1
  """Chat routes using LangChain retriever."""
2
 
3
- from app.config import settings
4
- from app.models.api import ChatRequest, ChatResponse
5
- from app.models.retrieval import ScoredChunk
6
- from app.retrieval.citation_filter import filter_citations
7
- from app.retrieval.langchain_retriever import AtlasGraphRetriever
8
  from fastapi import APIRouter
9
  from langchain.chains import RetrievalQA
10
  from langchain_groq import ChatGroq
11
 
 
 
 
 
 
 
12
  router = APIRouter()
13
 
14
 
 
1
  """Chat routes using LangChain retriever."""
2
 
 
 
 
 
 
3
  from fastapi import APIRouter
4
  from langchain.chains import RetrievalQA
5
  from langchain_groq import ChatGroq
6
 
7
+ from backend.app.config import settings
8
+ from backend.app.models.api import ChatRequest, ChatResponse
9
+ from backend.app.models.retrieval import ScoredChunk
10
+ from backend.app.retrieval.citation_filter import filter_citations
11
+ from backend.app.retrieval.langchain_retriever import AtlasGraphRetriever
12
+
13
  router = APIRouter()
14
 
15
 
backend/app/api/routes_docs.py CHANGED
@@ -4,10 +4,12 @@ import uuid
4
  from pathlib import Path
5
  from typing import Dict, List
6
 
7
- from app.ingestion.pipeline import ingest_pdf
8
- from app.models.ingestion import Chunk
9
  from fastapi import APIRouter, File, HTTPException, UploadFile
10
 
 
 
 
 
11
  router = APIRouter()
12
 
13
  DOC_STORAGE = Path("backend/storage/docs")
@@ -44,3 +46,62 @@ def upload_documents(
44
  results[doc_id] = chunks
45
 
46
  return results
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  from pathlib import Path
5
  from typing import Dict, List
6
 
 
 
7
  from fastapi import APIRouter, File, HTTPException, UploadFile
8
 
9
+ from backend.app.ingestion.pipeline import ingest_pdf
10
+ from backend.app.models.ingestion import Chunk
11
+ from backend.app.retrieval.chunk_registry import get_chunks
12
+
13
  router = APIRouter()
14
 
15
  DOC_STORAGE = Path("backend/storage/docs")
 
46
  results[doc_id] = chunks
47
 
48
  return results
49
+
50
+
51
+ @router.delete("/remove/{doc_id}")
52
+ def remove_document(doc_id: str) -> dict:
53
+ """Remove a document and its chunks from the system.
54
+
55
+ Args:
56
+ doc_id: Document ID to remove
57
+
58
+ Returns:
59
+ Status message
60
+ """
61
+ from backend.app.ingestion.indexing import COLLECTION_NAME, get_qdrant_client
62
+ from backend.app.retrieval.chunk_registry import _CHUNKS
63
+
64
+ # Remove chunks from registry
65
+ chunks_to_remove = [cid for cid, chunk in _CHUNKS.items() if chunk.doc_id == doc_id]
66
+ for chunk_id in chunks_to_remove:
67
+ _CHUNKS.pop(chunk_id, None)
68
+
69
+ # Remove from Qdrant
70
+ if chunks_to_remove:
71
+ try:
72
+ client = get_qdrant_client()
73
+ if client.collection_exists(COLLECTION_NAME):
74
+ client.delete(
75
+ collection_name=COLLECTION_NAME,
76
+ points_selector=chunks_to_remove,
77
+ )
78
+ except Exception as e:
79
+ print(f"Error removing from Qdrant: {e}")
80
+
81
+ # Remove PDF file
82
+ pdf_path = DOC_STORAGE / f"{doc_id}.pdf"
83
+ if pdf_path.exists():
84
+ pdf_path.unlink()
85
+
86
+ return {
87
+ "status": "success",
88
+ "message": f"Removed document {doc_id}",
89
+ "chunks_removed": len(chunks_to_remove),
90
+ }
91
+
92
+
93
+ @router.get("/list")
94
+ def list_documents() -> dict:
95
+ """List all currently loaded documents.
96
+
97
+ Returns:
98
+ Dictionary with document information
99
+ """
100
+ chunks = get_chunks()
101
+ doc_ids = list(set(chunk.doc_id for chunk in chunks))
102
+
103
+ return {
104
+ "total_documents": len(doc_ids),
105
+ "total_chunks": len(chunks),
106
+ "doc_ids": doc_ids,
107
+ }
backend/app/config.py CHANGED
@@ -8,7 +8,7 @@ class Settings(BaseSettings):
8
 
9
  groq_api_key: str = ""
10
  default_model: str = "openai/gpt-oss-120b"
11
- qdrant_path: str = "./backend/storage/qdrant"
12
 
13
  class Config:
14
  """Pydantic Settings configuration."""
 
8
 
9
  groq_api_key: str = ""
10
  default_model: str = "openai/gpt-oss-120b"
11
+ qdrant_path: str = "/data/qdrant"
12
 
13
  class Config:
14
  """Pydantic Settings configuration."""
backend/app/core/llm.py CHANGED
@@ -2,9 +2,10 @@
2
 
3
  from typing import Dict, List
4
 
5
- from app.config import settings
6
  from groq import Groq
7
 
 
 
8
 
9
  def _get_groq_client() -> Groq:
10
  """Return a Groq API client instance."""
 
2
 
3
  from typing import Dict, List
4
 
 
5
  from groq import Groq
6
 
7
+ from backend.app.config import settings
8
+
9
 
10
  def _get_groq_client() -> Groq:
11
  """Return a Groq API client instance."""
backend/app/evaluation/ablation.py CHANGED
@@ -1,10 +1,10 @@
1
  """Ablation study for AtlasRAG retrieval."""
2
 
3
- from app.evaluation.metrics import coverage, diversity, recall_at_k
4
- from app.evaluation.test_queries import TEST_QUERIES
5
- from app.evaluation.utils import extract_pages
6
- from app.retrieval.retrieve import hybrid_graph_search
7
- from app.retrieval.vector_store import vector_search
8
 
9
 
10
  def run_ablation() -> None:
 
1
  """Ablation study for AtlasRAG retrieval."""
2
 
3
+ from backend.app.evaluation.metrics import coverage, diversity, recall_at_k
4
+ from backend.app.evaluation.test_queries import TEST_QUERIES
5
+ from backend.app.evaluation.utils import extract_pages
6
+ from backend.app.retrieval.retrieve import hybrid_graph_search
7
+ from backend.app.retrieval.vector_store import vector_search
8
 
9
 
10
  def run_ablation() -> None:
backend/app/evaluation/compare_baseline.py CHANGED
@@ -1,10 +1,10 @@
1
  """Compare Vector Search vs Hybrid Graph-RAG."""
2
 
3
- from app.evaluation.metrics import coverage, diversity, recall_at_k
4
- from app.evaluation.test_queries import TEST_QUERIES
5
- from app.evaluation.utils import extract_pages
6
- from app.retrieval.retrieve import hybrid_graph_search
7
- from app.retrieval.vector_store import vector_search
8
 
9
 
10
  def _print_block(
 
1
  """Compare Vector Search vs Hybrid Graph-RAG."""
2
 
3
+ from backend.app.evaluation.metrics import coverage, diversity, recall_at_k
4
+ from backend.app.evaluation.test_queries import TEST_QUERIES
5
+ from backend.app.evaluation.utils import extract_pages
6
+ from backend.app.retrieval.retrieve import hybrid_graph_search
7
+ from backend.app.retrieval.vector_store import vector_search
8
 
9
 
10
  def _print_block(
backend/app/evaluation/retrievers.py CHANGED
@@ -2,8 +2,8 @@
2
 
3
  from typing import List
4
 
5
- from app.models.retrieval import ScoredChunk
6
- from app.retrieval.vector_store import vector_search
7
 
8
 
9
  def vector_only_search(query: str, top_k: int) -> List[ScoredChunk]:
 
2
 
3
  from typing import List
4
 
5
+ from backend.app.models.retrieval import ScoredChunk
6
+ from backend.app.retrieval.vector_store import vector_search
7
 
8
 
9
  def vector_only_search(query: str, top_k: int) -> List[ScoredChunk]:
backend/app/evaluation/utils.py CHANGED
@@ -2,7 +2,7 @@
2
 
3
  from typing import Iterable
4
 
5
- from app.models.retrieval import ScoredChunk
6
 
7
 
8
  def extract_pages(results: Iterable[ScoredChunk]) -> list[int]:
 
2
 
3
  from typing import Iterable
4
 
5
+ from backend.app.models.retrieval import ScoredChunk
6
 
7
 
8
  def extract_pages(results: Iterable[ScoredChunk]) -> list[int]:
backend/app/ingestion/chunking.py CHANGED
@@ -3,7 +3,7 @@
3
  import uuid
4
  from typing import List
5
 
6
- from app.models.ingestion import Chunk, RawSegment
7
 
8
  MAX_CHARS = 1500
9
  OVERLAP_CHARS = 200
 
3
  import uuid
4
  from typing import List
5
 
6
+ from backend.app.models.ingestion import Chunk, RawSegment
7
 
8
  MAX_CHARS = 1500
9
  OVERLAP_CHARS = 200
backend/app/ingestion/indexing.py CHANGED
@@ -2,12 +2,13 @@
2
 
3
  from typing import List
4
 
5
- from app.config import settings
6
- from app.core.embeddings import embed_texts
7
- from app.models.ingestion import Chunk
8
  from qdrant_client import QdrantClient
9
  from qdrant_client.models import Distance, PointStruct, VectorParams
10
 
 
 
 
 
11
  COLLECTION_NAME = "atlasrag_chunks"
12
 
13
 
 
2
 
3
  from typing import List
4
 
 
 
 
5
  from qdrant_client import QdrantClient
6
  from qdrant_client.models import Distance, PointStruct, VectorParams
7
 
8
+ from backend.app.config import settings
9
+ from backend.app.core.embeddings import embed_texts
10
+ from backend.app.models.ingestion import Chunk
11
+
12
  COLLECTION_NAME = "atlasrag_chunks"
13
 
14
 
backend/app/ingestion/pdf_loader.py CHANGED
@@ -5,7 +5,8 @@ from pathlib import Path
5
  from typing import List
6
 
7
  import fitz # PyMuPDF
8
- from app.models.ingestion import RawSegment
 
9
 
10
  HEADING_REGEX = re.compile(r"^\d+\.\s+[A-Z].+")
11
 
 
5
  from typing import List
6
 
7
  import fitz # PyMuPDF
8
+
9
+ from backend.app.models.ingestion import RawSegment
10
 
11
  HEADING_REGEX = re.compile(r"^\d+\.\s+[A-Z].+")
12
 
backend/app/ingestion/pipeline.py CHANGED
@@ -3,15 +3,15 @@
3
  from pathlib import Path
4
  from typing import List
5
 
6
- from app.ingestion.chunking import chunk_segments
7
- from app.ingestion.cleaning import clean_text
8
- from app.ingestion.entities import extract_entities
9
- from app.ingestion.indexing import index_chunks
10
- from app.ingestion.pdf_loader import extract_pages
11
- from app.models.ingestion import Chunk, RawSegment
12
- from app.retrieval.chunk_registry import register_chunks
13
- from app.retrieval.graph_utils import index_entities
14
- from app.retrieval.keyword_index import build_bm25_index
15
 
16
 
17
  def ingest_pdf(file_path: Path, doc_id: str) -> List[Chunk]:
 
3
  from pathlib import Path
4
  from typing import List
5
 
6
+ from backend.app.ingestion.chunking import chunk_segments
7
+ from backend.app.ingestion.cleaning import clean_text
8
+ from backend.app.ingestion.entities import extract_entities
9
+ from backend.app.ingestion.indexing import index_chunks
10
+ from backend.app.ingestion.pdf_loader import extract_pages
11
+ from backend.app.models.ingestion import Chunk, RawSegment
12
+ from backend.app.retrieval.chunk_registry import register_chunks
13
+ from backend.app.retrieval.graph_utils import index_entities
14
+ from backend.app.retrieval.keyword_index import build_bm25_index
15
 
16
 
17
  def ingest_pdf(file_path: Path, doc_id: str) -> List[Chunk]:
backend/app/main.py CHANGED
@@ -1,11 +1,12 @@
1
  """Main FastAPI application for AtlasRAG backend."""
2
 
3
- from app.api.routes_chat import router as chat_router
4
- from app.api.routes_chat_langchain import router as chat_langchain_router
5
- from app.api.routes_docs import router as docs_router
6
  from fastapi import FastAPI
7
  from fastapi.middleware.cors import CORSMiddleware
8
 
 
 
 
 
9
  app = FastAPI(
10
  title="AtlasRAG Backend",
11
  version="0.0.0",
 
1
  """Main FastAPI application for AtlasRAG backend."""
2
 
 
 
 
3
  from fastapi import FastAPI
4
  from fastapi.middleware.cors import CORSMiddleware
5
 
6
+ from backend.app.api.routes_chat import router as chat_router
7
+ from backend.app.api.routes_chat_langchain import router as chat_langchain_router
8
+ from backend.app.api.routes_docs import router as docs_router
9
+
10
  app = FastAPI(
11
  title="AtlasRAG Backend",
12
  version="0.0.0",
backend/app/memory/query_rewriter.py CHANGED
@@ -2,7 +2,7 @@
2
 
3
  from typing import List, Tuple
4
 
5
- from app.core.llm import llm_chat
6
 
7
  Message = Tuple[str, str]
8
 
 
2
 
3
  from typing import List, Tuple
4
 
5
+ from backend.app.core.llm import llm_chat
6
 
7
  Message = Tuple[str, str]
8
 
backend/app/models/api.py CHANGED
@@ -1,6 +1,6 @@
1
  """Pydantic models for API request and response bodies."""
2
 
3
- from typing import Literal
4
 
5
  from pydantic import BaseModel
6
 
@@ -12,6 +12,7 @@ class ChatRequest(BaseModel):
12
  top_k: int = 5
13
  mode: Literal["qa", "summarize"] = "qa"
14
  session_id: str = "default"
 
15
 
16
 
17
  class Citation(BaseModel):
 
1
  """Pydantic models for API request and response bodies."""
2
 
3
+ from typing import List, Literal, Optional
4
 
5
  from pydantic import BaseModel
6
 
 
12
  top_k: int = 5
13
  mode: Literal["qa", "summarize"] = "qa"
14
  session_id: str = "default"
15
+ doc_ids: Optional[List[str]]
16
 
17
 
18
  class Citation(BaseModel):
backend/app/models/retrieval.py CHANGED
@@ -1,8 +1,9 @@
1
  """Pydantic models for API request and response bodies."""
2
 
3
- from app.models.ingestion import Chunk
4
  from pydantic import BaseModel
5
 
 
 
6
 
7
  class KeywordSearchRequest(BaseModel):
8
  """Schema for BM25 search request body."""
 
1
  """Pydantic models for API request and response bodies."""
2
 
 
3
  from pydantic import BaseModel
4
 
5
+ from backend.app.models.ingestion import Chunk
6
+
7
 
8
  class KeywordSearchRequest(BaseModel):
9
  """Schema for BM25 search request body."""
backend/app/retrieval/chunk_registry.py CHANGED
@@ -10,7 +10,7 @@ Note:
10
 
11
  from typing import Dict, List
12
 
13
- from app.models.ingestion import Chunk
14
 
15
  _CHUNKS: Dict[str, Chunk] = {}
16
 
 
10
 
11
  from typing import Dict, List
12
 
13
+ from backend.app.models.ingestion import Chunk
14
 
15
  _CHUNKS: Dict[str, Chunk] = {}
16
 
backend/app/retrieval/citation_filter.py CHANGED
@@ -7,10 +7,11 @@ directly support the generated answer.
7
  import re
8
  from typing import List
9
 
10
- from app.models.api import Citation
11
- from app.models.retrieval import ScoredChunk
12
  from sentence_transformers import SentenceTransformer, util
13
 
 
 
 
14
  # Lightweight sentence embedder
15
  _SENTENCE_MODEL = SentenceTransformer("all-MiniLM-L6-v2")
16
 
 
7
  import re
8
  from typing import List
9
 
 
 
10
  from sentence_transformers import SentenceTransformer, util
11
 
12
+ from backend.app.models.api import Citation
13
+ from backend.app.models.retrieval import ScoredChunk
14
+
15
  # Lightweight sentence embedder
16
  _SENTENCE_MODEL = SentenceTransformer("all-MiniLM-L6-v2")
17
 
backend/app/retrieval/graph_utils.py CHANGED
@@ -4,7 +4,8 @@ from collections import defaultdict
4
  from typing import Dict, Iterable, List, Set
5
 
6
  import networkx as nx
7
- from app.models.ingestion import Chunk
 
8
 
9
  _ENTITY_TO_CHUNKS: Dict[str, Set[str]] = defaultdict(set)
10
 
 
4
  from typing import Dict, Iterable, List, Set
5
 
6
  import networkx as nx
7
+
8
+ from backend.app.models.ingestion import Chunk
9
 
10
  _ENTITY_TO_CHUNKS: Dict[str, Set[str]] = defaultdict(set)
11
 
backend/app/retrieval/keyword_index.py CHANGED
@@ -2,10 +2,11 @@
2
 
3
  from typing import List
4
 
5
- from app.models.ingestion import Chunk
6
- from app.models.retrieval import ScoredChunk
7
  from rank_bm25 import BM25Okapi
8
 
 
 
 
9
  _bm25: BM25Okapi | None = None
10
  _chunks: List[Chunk] = []
11
 
 
2
 
3
  from typing import List
4
 
 
 
5
  from rank_bm25 import BM25Okapi
6
 
7
+ from backend.app.models.ingestion import Chunk
8
+ from backend.app.models.retrieval import ScoredChunk
9
+
10
  _bm25: BM25Okapi | None = None
11
  _chunks: List[Chunk] = []
12
 
backend/app/retrieval/langchain_retriever.py CHANGED
@@ -2,10 +2,11 @@
2
 
3
  from typing import List
4
 
5
- from app.retrieval.retrieve import hybrid_graph_search
6
  from langchain_core.documents import Document
7
  from langchain_core.retrievers import BaseRetriever
8
 
 
 
9
 
10
  class AtlasGraphRetriever(BaseRetriever):
11
  """LangChain-compatible retriever wrapping hybrid Graph-RAG."""
 
2
 
3
  from typing import List
4
 
 
5
  from langchain_core.documents import Document
6
  from langchain_core.retrievers import BaseRetriever
7
 
8
+ from backend.app.retrieval.retrieve import hybrid_graph_search
9
+
10
 
11
  class AtlasGraphRetriever(BaseRetriever):
12
  """LangChain-compatible retriever wrapping hybrid Graph-RAG."""
backend/app/retrieval/reranker.py CHANGED
@@ -2,9 +2,10 @@
2
 
3
  from typing import List
4
 
5
- from app.models.retrieval import ScoredChunk
6
  from sentence_transformers import CrossEncoder
7
 
 
 
8
 
9
  class CrossEncoderReranker:
10
  """Cross-encoder reranker for precise relevance scoring.
 
2
 
3
  from typing import List
4
 
 
5
  from sentence_transformers import CrossEncoder
6
 
7
+ from backend.app.models.retrieval import ScoredChunk
8
+
9
 
10
  class CrossEncoderReranker:
11
  """Cross-encoder reranker for precise relevance scoring.
backend/app/retrieval/retrieve.py CHANGED
@@ -2,19 +2,19 @@
2
 
3
  from typing import Dict, List, Set
4
 
5
- from app.ingestion.entities import NLP
6
- from app.models.retrieval import ScoredChunk
7
- from app.retrieval.chunk_registry import get_chunks
8
- from app.retrieval.graph_utils import (
9
  adaptive_hops,
10
  build_graph,
11
  chunks_from_entities,
12
  expand_entities,
13
  extract_query_entities,
14
  )
15
- from app.retrieval.keyword_index import bm25_search
16
- from app.retrieval.reranker import CrossEncoderReranker
17
- from app.retrieval.vector_store import vector_search
18
 
19
  # Keywords that indicate comparison-style queries
20
  _COMPARISON_KEYWORDS = {
 
2
 
3
  from typing import Dict, List, Set
4
 
5
+ from backend.app.ingestion.entities import NLP
6
+ from backend.app.models.retrieval import ScoredChunk
7
+ from backend.app.retrieval.chunk_registry import get_chunks
8
+ from backend.app.retrieval.graph_utils import (
9
  adaptive_hops,
10
  build_graph,
11
  chunks_from_entities,
12
  expand_entities,
13
  extract_query_entities,
14
  )
15
+ from backend.app.retrieval.keyword_index import bm25_search
16
+ from backend.app.retrieval.reranker import CrossEncoderReranker
17
+ from backend.app.retrieval.vector_store import vector_search
18
 
19
  # Keywords that indicate comparison-style queries
20
  _COMPARISON_KEYWORDS = {
backend/app/retrieval/vector_store.py CHANGED
@@ -2,12 +2,13 @@
2
 
3
  from typing import List
4
 
5
- from app.core.embeddings import embed_texts
6
- from app.ingestion.indexing import COLLECTION_NAME, get_qdrant_client
7
- from app.models.ingestion import Chunk
8
- from app.models.retrieval import ScoredChunk
9
  from qdrant_client.models import ScoredPoint
10
 
 
 
 
 
 
11
 
12
  def vector_search(query: str, top_k: int = 5) -> List[ScoredChunk]:
13
  """Search for semantically similar chunks."""
 
2
 
3
  from typing import List
4
 
 
 
 
 
5
  from qdrant_client.models import ScoredPoint
6
 
7
+ from backend.app.core.embeddings import embed_texts
8
+ from backend.app.ingestion.indexing import COLLECTION_NAME, get_qdrant_client
9
+ from backend.app.models.ingestion import Chunk
10
+ from backend.app.models.retrieval import ScoredChunk
11
+
12
 
13
  def vector_search(query: str, top_k: int = 5) -> List[ScoredChunk]:
14
  """Search for semantically similar chunks."""
requirements.txt CHANGED
@@ -50,6 +50,4 @@ pre-commit==4.5.0
50
  black==24.4.2
51
  dotenv==0.9.9
52
  ruff==0.4.10
53
- isort==5.13.2
54
-
55
- -e .
 
50
  black==24.4.2
51
  dotenv==0.9.9
52
  ruff==0.4.10
53
+ isort==5.13.2