Anurag Shirke commited on
Commit
7ed2970
·
1 Parent(s): eefb354

Refactoring Quadrnt and Backend Startup

Browse files
Dockerfile CHANGED
@@ -2,6 +2,9 @@
2
  # Use an official Python runtime as a parent image
3
  FROM python:3.11-slim
4
 
 
 
 
5
  # Set the working directory in the container
6
  WORKDIR /app
7
 
 
2
  # Use an official Python runtime as a parent image
3
  FROM python:3.11-slim
4
 
5
+ # Install curl for the wait script
6
+ RUN apt-get update && apt-get install -y curl
7
+
8
  # Set the working directory in the container
9
  WORKDIR /app
10
 
SUMMARY.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Project Summary: Phases 1 & 2
2
+
3
+ This document summarizes the work completed in the first two phases of the RAG Knowledge Assistant project.
4
+
5
+ ---
6
+
7
+ ## Phase 1: Research & Setup
8
+
9
+ Phase 1 focused on establishing a fully containerized and automated local development environment.
10
+
11
+ ### Key Achievements:
12
+
13
+ 1. **Project Structure:**
14
+ - `src/`: Contains all the Python source code for the backend API.
15
+ - `uploads/`: A directory for temporarily storing uploaded files during processing.
16
+ - `scripts/`: Holds utility scripts, such as the automated model puller for Ollama.
17
+
18
+ 2. **Dependency Management:**
19
+ - A `requirements.txt` file was created to manage all Python dependencies, including FastAPI, LangChain, Qdrant, and Sentence-Transformers.
20
+
21
+ 3. **Containerization with Docker:**
22
+ - A `Dockerfile` was written to create a container image for our FastAPI application.
23
+ - A `docker-compose.yml` file orchestrates all the necessary services:
24
+ - `backend`: Our FastAPI application.
25
+ - `qdrant`: The vector database for storing document embeddings.
26
+ - `ollama`: The service for running the open-source LLM.
27
+
28
+ 4. **Automated Model Pulling:**
29
+ - An entrypoint script (`scripts/ollama_entrypoint.sh`) was created to automatically pull the `llama3` model when the Ollama container starts. This ensures the LLM is ready without manual intervention.
30
+
31
+ ---
32
+
33
+ ## Phase 2: Backend API MVP
34
+
35
+ Phase 2 focused on building the core functionality of the knowledge assistant, resulting in a functional RAG pipeline accessible via a REST API.
36
+
37
+ ### Key Achievements:
38
+
39
+ 1. **Modular Codebase:**
40
+ - The `src/core/` directory was created to organize the application's business logic into separate, manageable modules:
41
+ - `processing.py`: Handles PDF parsing, text chunking, and embedding model loading.
42
+ - `vector_store.py`: Manages all interactions with the Qdrant database (creation, upserting, searching).
43
+ - `llm.py`: Handles all interactions with the Ollama LLM service (prompt formatting, response generation).
44
+ - `models.py`: Defines the Pydantic models for API request and response data structures.
45
+
46
+ 2. **API Endpoints Implemented:**
47
+ - **`GET /health`**: A simple endpoint to confirm that the API is running.
48
+ - **`POST /upload`**: Implements the full document ingestion pipeline:
49
+ 1. Receives and validates a PDF file.
50
+ 2. Extracts text using `PyMuPDF`.
51
+ 3. Splits the text into smaller, overlapping chunks using `LangChain`.
52
+ 4. Generates vector embeddings for each chunk using `sentence-transformers`.
53
+ 5. Upserts the chunks and their embeddings into the Qdrant database.
54
+ - **`POST /query`**: Implements the complete RAG pipeline to answer questions:
55
+ 1. Receives a JSON object with a `query` string.
56
+ 2. Generates an embedding for the query.
57
+ 3. Searches Qdrant to retrieve the most relevant document chunks (Retrieval).
58
+ 4. Constructs a detailed prompt containing the user's query and the retrieved context.
59
+ 5. Sends the prompt to the `llama3` model via Ollama to get an answer (Augmented Generation).
60
+ 6. Returns the generated answer along with the source documents used for context.
docker-compose.yml CHANGED
@@ -8,12 +8,14 @@ services:
8
  - "8000:8000"
9
  volumes:
10
  - ./src:/app/src
 
11
  depends_on:
12
  - qdrant
13
  - ollama
14
  environment:
15
  - QDRANT_HOST=qdrant
16
  - OLLAMA_HOST=ollama
 
17
 
18
  qdrant:
19
  image: qdrant/qdrant:latest
 
8
  - "8000:8000"
9
  volumes:
10
  - ./src:/app/src
11
+ - ./scripts:/app/scripts
12
  depends_on:
13
  - qdrant
14
  - ollama
15
  environment:
16
  - QDRANT_HOST=qdrant
17
  - OLLAMA_HOST=ollama
18
+ entrypoint: ["/app/scripts/wait-for-qdrant.sh", "qdrant:6333", "--", "uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
19
 
20
  qdrant:
21
  image: qdrant/qdrant:latest
scripts/ollama_entrypoint.sh CHANGED
@@ -1,5 +1,9 @@
1
  #!/bin/bash
2
 
 
 
 
 
3
  # Start Ollama in the background
4
  /bin/ollama serve &
5
 
 
1
  #!/bin/bash
2
 
3
+ # The base ollama image doesn't have curl, so we install it.
4
+ # We'll attempt to update and install curl. This requires root privileges.
5
+ apt-get update && apt-get install -y curl
6
+
7
  # Start Ollama in the background
8
  /bin/ollama serve &
9
 
scripts/wait-for-qdrant.sh ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/sh
2
+ # wait-for-qdrant.sh
3
+
4
+ set -e
5
+
6
+ host="$1"
7
+ shift
8
+ cmd="$@"
9
+
10
+ # Loop until the Qdrant health check endpoint is reachable
11
+ until curl -s -f "$host/healthz" > /dev/null; do
12
+ >&2 echo "Qdrant is unavailable - sleeping"
13
+ sleep 1
14
+ done
15
+
16
+ >&2 echo "Qdrant is up - executing command"
17
+ exec $cmd
src/__pycache__/main.cpython-311.pyc CHANGED
Binary files a/src/__pycache__/main.cpython-311.pyc and b/src/__pycache__/main.cpython-311.pyc differ
 
src/core/__pycache__/llm.cpython-311.pyc ADDED
Binary file (1.96 kB). View file
 
src/core/__pycache__/models.cpython-311.pyc ADDED
Binary file (791 Bytes). View file
 
src/core/__pycache__/processing.cpython-311.pyc ADDED
Binary file (1.49 kB). View file
 
src/core/__pycache__/vector_store.cpython-311.pyc ADDED
Binary file (2.25 kB). View file