Arif commited on
Commit
68d0867
Β·
1 Parent(s): 10e948d

Added readme

Browse files
Files changed (1) hide show
  1. README.md +228 -0
README.md CHANGED
@@ -0,0 +1,228 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ RAG Portfolio Project
2
+ A state-of-the-art Retrieval-Augmented Generation (RAG) system leveraging modern generative AI and vector search technologies. This project demonstrates how to build a production-grade system that enables advanced question answering, document search, and contextual generation on your own infrastructureβ€”private, scalable, and fast.
3
+
4
+ Table of Contents
5
+ Project Overview
6
+
7
+ Features
8
+
9
+ Tech Stack
10
+
11
+ Getting Started
12
+
13
+ Architecture
14
+
15
+ API Endpoints
16
+
17
+ Usage Examples
18
+
19
+ Testing
20
+
21
+ Project Structure
22
+
23
+ Troubleshooting
24
+
25
+ Contributing
26
+
27
+ License
28
+
29
+ Project Overview
30
+ This project showcases how to combine large language models (LLMs), local vector databases, and a modern Python web API for secure, high-performance knowledge and document retrieval. All LLM operations run locallyβ€”no data leaves your machine.
31
+ It is ideal for applications in internal research, enterprise QA, knowledge management, or compliance-sensitive AI tasks.
32
+
33
+ Features
34
+ Local LLM Inference: Runs entirely on your machine using Ollama and open-source models (e.g., Llama 3.1).
35
+
36
+ Vector Database Search: Uses Qdrant for fast, scalable semantic retrieval.
37
+
38
+ Flexible Document Ingestion: Upload PDF, DOCX, or TXT files for indexing and search.
39
+
40
+ FastAPI Back End: High-concurrency, type-safe REST API with automatic docs.
41
+
42
+ Modern Python Package Management: Built with uv for blazing-fast dependency resolution.
43
+
44
+ Modular, Extensible Codebase: Clean architecture, easy to extend/maintain.
45
+
46
+ Privacy and Security: No cloud callsβ€”ideal for regulated sectors.
47
+
48
+ Fully Containerizable: Easily deploy with Docker.
49
+
50
+ Tech Stack
51
+ LLM: Ollama (local inference engine), Llama 3.1
52
+
53
+ Vector DB: Qdrant
54
+
55
+ Embeddings: Sentence Transformers
56
+
57
+ API: FastAPI + Uvicorn
58
+
59
+ Package Manager: uv
60
+
61
+ Code Editor: Cursor (recommended)
62
+
63
+ Testing & Quality: Pytest, Black, Ruff
64
+
65
+ DevOps: Docker-ready
66
+
67
+ Getting Started
68
+ 1. Prerequisites
69
+ Python 3.10+
70
+
71
+ uv package manager
72
+
73
+ Ollama installed locally
74
+
75
+ Qdrant (Docker recommended)
76
+
77
+ 2. Setup
78
+
79
+ # Clone the repository
80
+ git clone https://github.com/YOUR_USERNAME/rag-portfolio-project.git
81
+ cd rag-portfolio-project
82
+
83
+ # Install dependencies
84
+ uv sync
85
+
86
+ # Copy and configure environment variables
87
+ cp .env.example .env
88
+ # (Update .env if needed)
89
+
90
+
91
+ 3. Start Qdrant (Vector DB)
92
+
93
+ docker run -p 6333:6333 qdrant/qdrant
94
+
95
+ 4. Pull Ollama LLM Model
96
+
97
+ ollama pull llama3.1
98
+
99
+
100
+ 5. Run the FastAPI Application
101
+
102
+ uv run uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
103
+
104
+ 6. Open API Documentation
105
+ Access interactive docs at:
106
+ http://localhost:8000/docs
107
+
108
+
109
+ Architecture
110
+
111
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
112
+ β”‚ User β”‚
113
+ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
114
+ β”‚
115
+ β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”
116
+ β”‚ FastAPI REST β”‚
117
+ β”‚ Backend β”‚
118
+ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
119
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
120
+ β”‚ β”‚
121
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
122
+ β”‚ Document β”‚ β”‚ Query, RAG Chain β”‚
123
+ β”‚ Ingestion β”‚ β”‚ & Generation β”‚
124
+ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
125
+ β”‚
126
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
127
+ β”‚ Embedding β”‚
128
+ β”‚ Generation β”‚
129
+ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
130
+ β”‚
131
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
132
+ β”‚ Qdrant Vector β”‚
133
+ β”‚ Database (DB) β”‚
134
+ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
135
+ β”‚
136
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
137
+ β”‚ Ollama LLM β”‚
138
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
139
+
140
+ Document Ingestion: Split files into semantic chunks and index as vectors.
141
+
142
+ Embedding Generation: Semantic vectors via Sentence Transformers.
143
+
144
+ Vector Search: Qdrant returns most relevant contexts for input queries.
145
+
146
+ Generative Augmentation: Ollama answers using retrieved context (true RAG).
147
+
148
+ API Endpoints
149
+
150
+ Method | Path | Description
151
+ --------+----------------+---------------------------------
152
+ GET | / | Root endpoint
153
+ GET | /health | Check system status
154
+ POST | /ingest/file | Upload and index document
155
+ POST | /query | Query system for answer
156
+ DELETE | /reset | Reset vector database (danger!)
157
+
158
+ Automated docs: http://localhost:8000/docs
159
+
160
+ Usage Examples
161
+ 1. Upload a Document (.pdf/.docx/.txt):
162
+ curl -X POST "http://localhost:8000/ingest/file" \
163
+ -H "accept: application/json" \
164
+ -F "file=@your_document.pdf"
165
+
166
+ 2. Query the System:
167
+ curl -X POST "http://localhost:8000/query" \
168
+ -H "Content-Type: application/json" \
169
+ -d '{"question": "What is the key insight in the uploaded document?", "top_k": 5}'
170
+
171
+ 3. Reset Collection:
172
+ curl -X DELETE "http://localhost:8000/reset"
173
+
174
+
175
+ Testing
176
+ Unit tests provided in /tests using Pytest.
177
+
178
+ Run all tests:
179
+ uv run pytest
180
+
181
+ Ensure code quality:
182
+ uv run black app/ tests/
183
+ uv run ruff app/ tests/
184
+
185
+
186
+ Project Structure
187
+ rag-portfolio-project/
188
+ β”œβ”€β”€ .env # Environment config
189
+ β”œβ”€β”€ pyproject.toml # Dependencies config
190
+ β”œβ”€β”€ README.md # This documentation
191
+ β”œβ”€β”€ app/
192
+ β”‚ β”œβ”€β”€ main.py # FastAPI app
193
+ β”‚ β”œβ”€β”€ config.py # Config loader
194
+ β”‚ β”œβ”€β”€ models/ # Pydantic schemas
195
+ β”‚ β”œβ”€β”€ core/ # LLM, embeddings, vector DB
196
+ β”‚ β”œβ”€β”€ services/ # Document ingestion, RAG chain
197
+ β”‚ └── api/ # API routes and dependencies
198
+ β”œβ”€β”€ data/
199
+ β”‚ β”œβ”€β”€ documents/ # Raw document storage
200
+ β”‚ └── processed/ # Chunked files
201
+ β”œβ”€β”€ tests/
202
+ β”‚ └── test_rag.py # Unit tests
203
+ └── scripts/
204
+ β”œβ”€β”€ setup_qdrant.py # DB utils
205
+ └── ingest_documents.py # Bulk ingest
206
+
207
+
208
+ Troubleshooting
209
+ Missing Modules?
210
+ Run uv add <module-name> for any missing Python packages.
211
+
212
+ Ollama Model Not Found?
213
+ Double-check model name with ollama list and update .env.
214
+
215
+ Qdrant Not Running?
216
+ Ensure container is up (docker ps).
217
+
218
+ File Upload Errors?
219
+ Check you have python-multipart installed.
220
+
221
+ Contributing
222
+ Contributions are welcome! Please fork the repository, open issues, or submit pull requests for bug fixes, docs improvements, or new features.
223
+
224
+ License
225
+ Open-source under the MIT License.
226
+
227
+ Questions?
228
+ Contact the repository owner or open an issue – happy to help!