Anurag Shirke commited on
Commit ·
7ed2970
1
Parent(s): eefb354
Refactoring Quadrnt and Backend Startup
Browse files- Dockerfile +3 -0
- SUMMARY.md +60 -0
- docker-compose.yml +2 -0
- scripts/ollama_entrypoint.sh +4 -0
- scripts/wait-for-qdrant.sh +17 -0
- src/__pycache__/main.cpython-311.pyc +0 -0
- src/core/__pycache__/llm.cpython-311.pyc +0 -0
- src/core/__pycache__/models.cpython-311.pyc +0 -0
- src/core/__pycache__/processing.cpython-311.pyc +0 -0
- src/core/__pycache__/vector_store.cpython-311.pyc +0 -0
Dockerfile
CHANGED
|
@@ -2,6 +2,9 @@
|
|
| 2 |
# Use an official Python runtime as a parent image
|
| 3 |
FROM python:3.11-slim
|
| 4 |
|
|
|
|
|
|
|
|
|
|
| 5 |
# Set the working directory in the container
|
| 6 |
WORKDIR /app
|
| 7 |
|
|
|
|
| 2 |
# Use an official Python runtime as a parent image
|
| 3 |
FROM python:3.11-slim
|
| 4 |
|
| 5 |
+
# Install curl for the wait script
|
| 6 |
+
RUN apt-get update && apt-get install -y curl
|
| 7 |
+
|
| 8 |
# Set the working directory in the container
|
| 9 |
WORKDIR /app
|
| 10 |
|
SUMMARY.md
ADDED
|
@@ -0,0 +1,60 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Project Summary: Phases 1 & 2
|
| 2 |
+
|
| 3 |
+
This document summarizes the work completed in the first two phases of the RAG Knowledge Assistant project.
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## Phase 1: Research & Setup
|
| 8 |
+
|
| 9 |
+
Phase 1 focused on establishing a fully containerized and automated local development environment.
|
| 10 |
+
|
| 11 |
+
### Key Achievements:
|
| 12 |
+
|
| 13 |
+
1. **Project Structure:**
|
| 14 |
+
- `src/`: Contains all the Python source code for the backend API.
|
| 15 |
+
- `uploads/`: A directory for temporarily storing uploaded files during processing.
|
| 16 |
+
- `scripts/`: Holds utility scripts, such as the automated model puller for Ollama.
|
| 17 |
+
|
| 18 |
+
2. **Dependency Management:**
|
| 19 |
+
- A `requirements.txt` file was created to manage all Python dependencies, including FastAPI, LangChain, Qdrant, and Sentence-Transformers.
|
| 20 |
+
|
| 21 |
+
3. **Containerization with Docker:**
|
| 22 |
+
- A `Dockerfile` was written to create a container image for our FastAPI application.
|
| 23 |
+
- A `docker-compose.yml` file orchestrates all the necessary services:
|
| 24 |
+
- `backend`: Our FastAPI application.
|
| 25 |
+
- `qdrant`: The vector database for storing document embeddings.
|
| 26 |
+
- `ollama`: The service for running the open-source LLM.
|
| 27 |
+
|
| 28 |
+
4. **Automated Model Pulling:**
|
| 29 |
+
- An entrypoint script (`scripts/ollama_entrypoint.sh`) was created to automatically pull the `llama3` model when the Ollama container starts. This ensures the LLM is ready without manual intervention.
|
| 30 |
+
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
+
## Phase 2: Backend API MVP
|
| 34 |
+
|
| 35 |
+
Phase 2 focused on building the core functionality of the knowledge assistant, resulting in a functional RAG pipeline accessible via a REST API.
|
| 36 |
+
|
| 37 |
+
### Key Achievements:
|
| 38 |
+
|
| 39 |
+
1. **Modular Codebase:**
|
| 40 |
+
- The `src/core/` directory was created to organize the application's business logic into separate, manageable modules:
|
| 41 |
+
- `processing.py`: Handles PDF parsing, text chunking, and embedding model loading.
|
| 42 |
+
- `vector_store.py`: Manages all interactions with the Qdrant database (creation, upserting, searching).
|
| 43 |
+
- `llm.py`: Handles all interactions with the Ollama LLM service (prompt formatting, response generation).
|
| 44 |
+
- `models.py`: Defines the Pydantic models for API request and response data structures.
|
| 45 |
+
|
| 46 |
+
2. **API Endpoints Implemented:**
|
| 47 |
+
- **`GET /health`**: A simple endpoint to confirm that the API is running.
|
| 48 |
+
- **`POST /upload`**: Implements the full document ingestion pipeline:
|
| 49 |
+
1. Receives and validates a PDF file.
|
| 50 |
+
2. Extracts text using `PyMuPDF`.
|
| 51 |
+
3. Splits the text into smaller, overlapping chunks using `LangChain`.
|
| 52 |
+
4. Generates vector embeddings for each chunk using `sentence-transformers`.
|
| 53 |
+
5. Upserts the chunks and their embeddings into the Qdrant database.
|
| 54 |
+
- **`POST /query`**: Implements the complete RAG pipeline to answer questions:
|
| 55 |
+
1. Receives a JSON object with a `query` string.
|
| 56 |
+
2. Generates an embedding for the query.
|
| 57 |
+
3. Searches Qdrant to retrieve the most relevant document chunks (Retrieval).
|
| 58 |
+
4. Constructs a detailed prompt containing the user's query and the retrieved context.
|
| 59 |
+
5. Sends the prompt to the `llama3` model via Ollama to get an answer (Augmented Generation).
|
| 60 |
+
6. Returns the generated answer along with the source documents used for context.
|
docker-compose.yml
CHANGED
|
@@ -8,12 +8,14 @@ services:
|
|
| 8 |
- "8000:8000"
|
| 9 |
volumes:
|
| 10 |
- ./src:/app/src
|
|
|
|
| 11 |
depends_on:
|
| 12 |
- qdrant
|
| 13 |
- ollama
|
| 14 |
environment:
|
| 15 |
- QDRANT_HOST=qdrant
|
| 16 |
- OLLAMA_HOST=ollama
|
|
|
|
| 17 |
|
| 18 |
qdrant:
|
| 19 |
image: qdrant/qdrant:latest
|
|
|
|
| 8 |
- "8000:8000"
|
| 9 |
volumes:
|
| 10 |
- ./src:/app/src
|
| 11 |
+
- ./scripts:/app/scripts
|
| 12 |
depends_on:
|
| 13 |
- qdrant
|
| 14 |
- ollama
|
| 15 |
environment:
|
| 16 |
- QDRANT_HOST=qdrant
|
| 17 |
- OLLAMA_HOST=ollama
|
| 18 |
+
entrypoint: ["/app/scripts/wait-for-qdrant.sh", "qdrant:6333", "--", "uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
|
| 19 |
|
| 20 |
qdrant:
|
| 21 |
image: qdrant/qdrant:latest
|
scripts/ollama_entrypoint.sh
CHANGED
|
@@ -1,5 +1,9 @@
|
|
| 1 |
#!/bin/bash
|
| 2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
# Start Ollama in the background
|
| 4 |
/bin/ollama serve &
|
| 5 |
|
|
|
|
| 1 |
#!/bin/bash
|
| 2 |
|
| 3 |
+
# The base ollama image doesn't have curl, so we install it.
|
| 4 |
+
# We'll attempt to update and install curl. This requires root privileges.
|
| 5 |
+
apt-get update && apt-get install -y curl
|
| 6 |
+
|
| 7 |
# Start Ollama in the background
|
| 8 |
/bin/ollama serve &
|
| 9 |
|
scripts/wait-for-qdrant.sh
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/sh
|
| 2 |
+
# wait-for-qdrant.sh
|
| 3 |
+
|
| 4 |
+
set -e
|
| 5 |
+
|
| 6 |
+
host="$1"
|
| 7 |
+
shift
|
| 8 |
+
cmd="$@"
|
| 9 |
+
|
| 10 |
+
# Loop until the Qdrant health check endpoint is reachable
|
| 11 |
+
until curl -s -f "$host/healthz" > /dev/null; do
|
| 12 |
+
>&2 echo "Qdrant is unavailable - sleeping"
|
| 13 |
+
sleep 1
|
| 14 |
+
done
|
| 15 |
+
|
| 16 |
+
>&2 echo "Qdrant is up - executing command"
|
| 17 |
+
exec $cmd
|
src/__pycache__/main.cpython-311.pyc
CHANGED
|
Binary files a/src/__pycache__/main.cpython-311.pyc and b/src/__pycache__/main.cpython-311.pyc differ
|
|
|
src/core/__pycache__/llm.cpython-311.pyc
ADDED
|
Binary file (1.96 kB). View file
|
|
|
src/core/__pycache__/models.cpython-311.pyc
ADDED
|
Binary file (791 Bytes). View file
|
|
|
src/core/__pycache__/processing.cpython-311.pyc
ADDED
|
Binary file (1.49 kB). View file
|
|
|
src/core/__pycache__/vector_store.cpython-311.pyc
ADDED
|
Binary file (2.25 kB). View file
|
|
|