Spaces:
Sleeping
Sleeping
Jatin Mehra commited on
Commit Β·
535ca47
1
Parent(s): f255569
Add comprehensive documentation including API reference, development guide, and index
Browse files- README.md +14 -0
- docs/API.md +231 -0
- docs/DEVELOPMENT.md +374 -0
- docs/README.md +169 -0
- docs/index.md +86 -0
README.md
CHANGED
|
@@ -578,6 +578,20 @@ docker-compose up -d
|
|
| 578 |
- **π WebSocket Support**: Real-time chat updates and live document processing
|
| 579 |
- **π§ Model Upgrades**: Integration with latest embedding and LLM models
|
| 580 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 581 |
## π License
|
| 582 |
|
| 583 |
This project is licensed under the **Apache License 2.0** - see the [LICENSE](LICENSE) file for complete details.
|
|
|
|
| 578 |
- **π WebSocket Support**: Real-time chat updates and live document processing
|
| 579 |
- **π§ Model Upgrades**: Integration with latest embedding and LLM models
|
| 580 |
|
| 581 |
+
## π Documentation
|
| 582 |
+
|
| 583 |
+
Comprehensive documentation is available in the `docs/` directory:
|
| 584 |
+
|
| 585 |
+
- **[π Documentation Index](docs/index.md)** - Complete documentation overview
|
| 586 |
+
- **[ποΈ Architecture & Quick Start](docs/README.md)** - Project architecture with mermaid diagram
|
| 587 |
+
- **[π API Reference](docs/API.md)** - REST API endpoints and examples
|
| 588 |
+
- **[π» Development Guide](docs/DEVELOPMENT.md)** - Contributing and development setup
|
| 589 |
+
|
| 590 |
+
### Interactive API Documentation
|
| 591 |
+
When the server is running, visit:
|
| 592 |
+
- **Swagger UI**: http://localhost:8000/docs
|
| 593 |
+
- **ReDoc**: http://localhost:8000/redoc
|
| 594 |
+
|
| 595 |
## π License
|
| 596 |
|
| 597 |
This project is licensed under the **Apache License 2.0** - see the [LICENSE](LICENSE) file for complete details.
|
docs/API.md
ADDED
|
@@ -0,0 +1,231 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# API Documentation
|
| 2 |
+
|
| 3 |
+
This document provides a quick reference for the RAG Chat Application REST API endpoints.
|
| 4 |
+
|
| 5 |
+
## Base URL
|
| 6 |
+
```
|
| 7 |
+
http://localhost:8000
|
| 8 |
+
```
|
| 9 |
+
|
| 10 |
+
## Authentication
|
| 11 |
+
Most endpoints require a GROQ API key to be configured:
|
| 12 |
+
|
| 13 |
+
```bash
|
| 14 |
+
POST /set-api-key
|
| 15 |
+
Content-Type: application/json
|
| 16 |
+
|
| 17 |
+
{
|
| 18 |
+
"api_key": "your_groq_api_key_here"
|
| 19 |
+
}
|
| 20 |
+
```
|
| 21 |
+
|
| 22 |
+
## Core Endpoints
|
| 23 |
+
|
| 24 |
+
### Document Processing
|
| 25 |
+
|
| 26 |
+
#### Upload Files
|
| 27 |
+
```bash
|
| 28 |
+
POST /upload-files
|
| 29 |
+
Content-Type: multipart/form-data
|
| 30 |
+
|
| 31 |
+
# Form data with file uploads
|
| 32 |
+
files: [file1.pdf, file2.txt, ...]
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
**Response:**
|
| 36 |
+
```json
|
| 37 |
+
{
|
| 38 |
+
"total_files": 5,
|
| 39 |
+
"total_documents": 12,
|
| 40 |
+
"total_chunks": 87,
|
| 41 |
+
"file_types": ["pdf", "txt", "py"],
|
| 42 |
+
"type_counts": {"pdf": 3, "txt": 1, "py": 1}
|
| 43 |
+
}
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
#### Process Directory
|
| 47 |
+
```bash
|
| 48 |
+
POST /process-directory
|
| 49 |
+
Content-Type: application/x-www-form-urlencoded
|
| 50 |
+
|
| 51 |
+
directory_path=/path/to/documents
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
### Chat Interface
|
| 55 |
+
|
| 56 |
+
#### Send Chat Message
|
| 57 |
+
```bash
|
| 58 |
+
POST /chat
|
| 59 |
+
Content-Type: application/json
|
| 60 |
+
|
| 61 |
+
{
|
| 62 |
+
"message": "What is the main topic of the documents?"
|
| 63 |
+
}
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
**Response:**
|
| 67 |
+
```json
|
| 68 |
+
{
|
| 69 |
+
"response": "Based on the documents, the main topics include...",
|
| 70 |
+
"citations": [
|
| 71 |
+
{
|
| 72 |
+
"content": "relevant excerpt from document",
|
| 73 |
+
"citation": "/path/to/source/file.pdf",
|
| 74 |
+
"type": "pdf",
|
| 75 |
+
"score": 0.85
|
| 76 |
+
}
|
| 77 |
+
],
|
| 78 |
+
"themes": {
|
| 79 |
+
"key_themes": ["AI", "Machine Learning", "RAG"],
|
| 80 |
+
"analysis": "The documents focus on AI and ML concepts..."
|
| 81 |
+
},
|
| 82 |
+
"timestamp": "2025-06-11T10:30:00.123456"
|
| 83 |
+
}
|
| 84 |
+
```
|
| 85 |
+
|
| 86 |
+
### Data Management
|
| 87 |
+
|
| 88 |
+
#### Get Statistics
|
| 89 |
+
```bash
|
| 90 |
+
GET /stats
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
**Response:**
|
| 94 |
+
```json
|
| 95 |
+
{
|
| 96 |
+
"total_files": 10,
|
| 97 |
+
"total_documents": 25,
|
| 98 |
+
"total_chunks": 150,
|
| 99 |
+
"file_types": ["pdf", "txt", "py", "md"],
|
| 100 |
+
"type_counts": {"pdf": 5, "txt": 3, "py": 1, "md": 1},
|
| 101 |
+
"processed_at": "2025-06-11 10:30:00"
|
| 102 |
+
}
|
| 103 |
+
```
|
| 104 |
+
|
| 105 |
+
#### Get Chat History
|
| 106 |
+
```bash
|
| 107 |
+
GET /chat-history
|
| 108 |
+
```
|
| 109 |
+
|
| 110 |
+
**Response:**
|
| 111 |
+
```json
|
| 112 |
+
[
|
| 113 |
+
{
|
| 114 |
+
"user_message": "What is RAG?",
|
| 115 |
+
"assistant_response": "RAG stands for Retrieval-Augmented Generation...",
|
| 116 |
+
"timestamp": "2025-06-11T10:30:00.123456",
|
| 117 |
+
"citations": [...]
|
| 118 |
+
}
|
| 119 |
+
]
|
| 120 |
+
```
|
| 121 |
+
|
| 122 |
+
#### Clear Chat History
|
| 123 |
+
```bash
|
| 124 |
+
DELETE /clear-chat
|
| 125 |
+
```
|
| 126 |
+
|
| 127 |
+
### Vector Store Management
|
| 128 |
+
|
| 129 |
+
#### Save Vector Store
|
| 130 |
+
```bash
|
| 131 |
+
POST /save-vector-store
|
| 132 |
+
```
|
| 133 |
+
|
| 134 |
+
**Response:**
|
| 135 |
+
```json
|
| 136 |
+
{
|
| 137 |
+
"message": "Vector store saved successfully"
|
| 138 |
+
}
|
| 139 |
+
```
|
| 140 |
+
|
| 141 |
+
#### Load Vector Store
|
| 142 |
+
```bash
|
| 143 |
+
POST /load-vector-store
|
| 144 |
+
```
|
| 145 |
+
|
| 146 |
+
**Response:**
|
| 147 |
+
```json
|
| 148 |
+
{
|
| 149 |
+
"message": "Vector store loaded successfully",
|
| 150 |
+
"stats": {
|
| 151 |
+
"total_files": 10,
|
| 152 |
+
"total_documents": 25,
|
| 153 |
+
"total_chunks": 150
|
| 154 |
+
}
|
| 155 |
+
}
|
| 156 |
+
```
|
| 157 |
+
|
| 158 |
+
## Frontend Serving
|
| 159 |
+
|
| 160 |
+
#### Main Application
|
| 161 |
+
```bash
|
| 162 |
+
GET /
|
| 163 |
+
```
|
| 164 |
+
Returns the HTML frontend application.
|
| 165 |
+
|
| 166 |
+
## Error Responses
|
| 167 |
+
|
| 168 |
+
All endpoints return errors in this format:
|
| 169 |
+
```json
|
| 170 |
+
{
|
| 171 |
+
"detail": "Error description message"
|
| 172 |
+
}
|
| 173 |
+
```
|
| 174 |
+
|
| 175 |
+
Common HTTP status codes:
|
| 176 |
+
- `200` - Success
|
| 177 |
+
- `400` - Bad Request (invalid input)
|
| 178 |
+
- `422` - Validation Error
|
| 179 |
+
- `500` - Internal Server Error
|
| 180 |
+
|
| 181 |
+
## Interactive Documentation
|
| 182 |
+
|
| 183 |
+
When the server is running, visit:
|
| 184 |
+
- **Swagger UI**: http://localhost:8000/docs
|
| 185 |
+
- **ReDoc**: http://localhost:8000/redoc
|
| 186 |
+
|
| 187 |
+
## Examples
|
| 188 |
+
|
| 189 |
+
### Complete Workflow
|
| 190 |
+
```bash
|
| 191 |
+
# 1. Set API key
|
| 192 |
+
curl -X POST "http://localhost:8000/set-api-key" \
|
| 193 |
+
-H "Content-Type: application/json" \
|
| 194 |
+
-d '{"api_key": "your_groq_key"}'
|
| 195 |
+
|
| 196 |
+
# 2. Upload files
|
| 197 |
+
curl -X POST "http://localhost:8000/upload-files" \
|
| 198 |
+
-F "files=@document1.pdf" \
|
| 199 |
+
-F "files=@document2.txt"
|
| 200 |
+
|
| 201 |
+
# 3. Chat with documents
|
| 202 |
+
curl -X POST "http://localhost:8000/chat" \
|
| 203 |
+
-H "Content-Type: application/json" \
|
| 204 |
+
-d '{"message": "Summarize the key points"}'
|
| 205 |
+
|
| 206 |
+
# 4. Get statistics
|
| 207 |
+
curl -X GET "http://localhost:8000/stats"
|
| 208 |
+
|
| 209 |
+
# 5. Save vector store
|
| 210 |
+
curl -X POST "http://localhost:8000/save-vector-store"
|
| 211 |
+
```
|
| 212 |
+
|
| 213 |
+
### Python Client Example
|
| 214 |
+
```python
|
| 215 |
+
import requests
|
| 216 |
+
|
| 217 |
+
base_url = "http://localhost:8000"
|
| 218 |
+
|
| 219 |
+
# Set API key
|
| 220 |
+
response = requests.post(f"{base_url}/set-api-key",
|
| 221 |
+
json={"api_key": "your_groq_key"})
|
| 222 |
+
|
| 223 |
+
# Upload files
|
| 224 |
+
files = {'files': open('document.pdf', 'rb')}
|
| 225 |
+
response = requests.post(f"{base_url}/upload-files", files=files)
|
| 226 |
+
|
| 227 |
+
# Chat
|
| 228 |
+
response = requests.post(f"{base_url}/chat",
|
| 229 |
+
json={"message": "What is this document about?"})
|
| 230 |
+
print(response.json())
|
| 231 |
+
```
|
docs/DEVELOPMENT.md
ADDED
|
@@ -0,0 +1,374 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Development Guide
|
| 2 |
+
|
| 3 |
+
This guide helps developers understand the codebase and contribute to the RAG Chat Application.
|
| 4 |
+
|
| 5 |
+
## ποΈ Project Structure
|
| 6 |
+
|
| 7 |
+
```
|
| 8 |
+
wasserstoff-AiInternTask/
|
| 9 |
+
βββ rag_elements/ # π§ Core RAG Engine
|
| 10 |
+
β βββ enhanced_vectordb.py # Main RAG implementation
|
| 11 |
+
β βββ config.py # Configuration management
|
| 12 |
+
βββ backend/ # π FastAPI Production Server
|
| 13 |
+
β βββ main.py # App entry point
|
| 14 |
+
β βββ models.py # Pydantic schemas
|
| 15 |
+
β βββ utils.py # Utilities and state
|
| 16 |
+
β βββ routes/ # API endpoints
|
| 17 |
+
βββ frontend/ # π¨ Web Interface
|
| 18 |
+
β βββ index.html # Main UI
|
| 19 |
+
β βββ style.css # Styling
|
| 20 |
+
β βββ script.js # Frontend logic
|
| 21 |
+
βββ tests/ # π§ͺ Test Suite
|
| 22 |
+
βββ docs/ # π Documentation
|
| 23 |
+
```
|
| 24 |
+
|
| 25 |
+
## π§ Development Setup
|
| 26 |
+
|
| 27 |
+
### Prerequisites
|
| 28 |
+
- Python 3.8+
|
| 29 |
+
- Git
|
| 30 |
+
- Text editor/IDE (VS Code recommended)
|
| 31 |
+
|
| 32 |
+
### Environment Setup
|
| 33 |
+
```bash
|
| 34 |
+
# Clone repository
|
| 35 |
+
git clone https://github.com/Jatin-Mehra119/wasserstoff-AiInternTask.git
|
| 36 |
+
cd wasserstoff-AiInternTask
|
| 37 |
+
|
| 38 |
+
# Create virtual environment (recommended)
|
| 39 |
+
python -m venv venv
|
| 40 |
+
source venv/bin/activate # Linux/macOS
|
| 41 |
+
# or venv\Scripts\activate # Windows
|
| 42 |
+
|
| 43 |
+
# Install dependencies
|
| 44 |
+
pip install -r requirements.txt
|
| 45 |
+
|
| 46 |
+
# Install development dependencies
|
| 47 |
+
pip install -r tests/requirements-test.txt
|
| 48 |
+
|
| 49 |
+
# Set up environment variables
|
| 50 |
+
cp .env.example .env # Create if exists
|
| 51 |
+
# Add your GROQ_API_KEY to .env
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
### Running in Development Mode
|
| 55 |
+
```bash
|
| 56 |
+
# Start FastAPI with hot reload
|
| 57 |
+
cd backend
|
| 58 |
+
python -m uvicorn main:app --reload --host 0.0.0.0 --port 8000
|
| 59 |
+
|
| 60 |
+
# Or run Streamlit version
|
| 61 |
+
streamlit run streamlit_rag_app.py
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
## π§± Core Components
|
| 65 |
+
|
| 66 |
+
### 1. RAG Engine (`rag_elements/enhanced_vectordb.py`)
|
| 67 |
+
|
| 68 |
+
The heart of the application. Key classes and methods:
|
| 69 |
+
|
| 70 |
+
```python
|
| 71 |
+
class EnhancedDocumentProcessor:
|
| 72 |
+
def process_files(self, file_paths) # Multi-format processing
|
| 73 |
+
def create_enhanced_vector_store(self, documents) # FAISS index creation
|
| 74 |
+
def search_with_citations(self, query, k=5) # Semantic search
|
| 75 |
+
def get_chat_response(self, query) # End-to-end chat
|
| 76 |
+
def save_vector_store(self, path) # Persistence
|
| 77 |
+
def load_vector_store(self, path) # Restore data
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
### 2. FastAPI Backend (`backend/`)
|
| 81 |
+
|
| 82 |
+
**Entry Point (`main.py`)**:
|
| 83 |
+
- FastAPI app initialization
|
| 84 |
+
- CORS configuration
|
| 85 |
+
- Route registration
|
| 86 |
+
|
| 87 |
+
**Data Models (`models.py`)**:
|
| 88 |
+
- Pydantic schemas for API requests/responses
|
| 89 |
+
- Type validation and serialization
|
| 90 |
+
|
| 91 |
+
**Routes (`routes/`)**:
|
| 92 |
+
- `main_routes.py` - Frontend serving, health checks
|
| 93 |
+
- `upload_routes.py` - File upload and processing
|
| 94 |
+
- `chat_routes.py` - Chat interface and AI responses
|
| 95 |
+
- `store_routes.py` - Vector store management
|
| 96 |
+
|
| 97 |
+
**Utilities (`utils.py`)**:
|
| 98 |
+
- Global state management
|
| 99 |
+
- Helper functions
|
| 100 |
+
- Error handling utilities
|
| 101 |
+
|
| 102 |
+
### 3. Frontend (`frontend/`)
|
| 103 |
+
|
| 104 |
+
Modern web interface with:
|
| 105 |
+
- **HTML**: Semantic structure with responsive layout
|
| 106 |
+
- **CSS**: Modern styling with CSS Grid/Flexbox
|
| 107 |
+
- **JavaScript**: Async API calls, real-time updates, file handling
|
| 108 |
+
|
| 109 |
+
## π Data Flow
|
| 110 |
+
|
| 111 |
+
### Document Processing Pipeline
|
| 112 |
+
1. **File Upload** β `upload_routes.py`
|
| 113 |
+
2. **Text Extraction** β `enhanced_vectordb.py`
|
| 114 |
+
3. **Chunking** β LangChain text splitters
|
| 115 |
+
4. **Embeddings** β Sentence Transformers
|
| 116 |
+
5. **Indexing** β FAISS vector store
|
| 117 |
+
6. **Metadata Storage** β JSON persistence
|
| 118 |
+
|
| 119 |
+
### Chat Pipeline
|
| 120 |
+
1. **User Query** β `chat_routes.py`
|
| 121 |
+
2. **Semantic Search** β FAISS similarity search
|
| 122 |
+
3. **Context Retrieval** β Top-K document chunks
|
| 123 |
+
4. **AI Response** β GROQ API integration
|
| 124 |
+
5. **Citation Generation** β Source attribution
|
| 125 |
+
6. **Response Formatting** β Markdown output
|
| 126 |
+
|
| 127 |
+
## π§ͺ Testing
|
| 128 |
+
|
| 129 |
+
### Running Tests
|
| 130 |
+
```bash
|
| 131 |
+
cd tests
|
| 132 |
+
|
| 133 |
+
# Run all tests
|
| 134 |
+
bash run_tests.sh
|
| 135 |
+
|
| 136 |
+
# Run specific test files
|
| 137 |
+
python -m pytest test_endpoints_pytest.py -v
|
| 138 |
+
python test_api_endpoints.py
|
| 139 |
+
```
|
| 140 |
+
|
| 141 |
+
### Test Structure
|
| 142 |
+
- `test_api_endpoints.py` - Basic API endpoint testing
|
| 143 |
+
- `test_endpoints_pytest.py` - Comprehensive pytest suite
|
| 144 |
+
- `run_tests.sh` - Test runner script
|
| 145 |
+
|
| 146 |
+
### Writing Tests
|
| 147 |
+
Follow these patterns:
|
| 148 |
+
|
| 149 |
+
```python
|
| 150 |
+
# API endpoint test
|
| 151 |
+
def test_upload_endpoint():
|
| 152 |
+
response = requests.post(f"{BASE_URL}/upload-files", files=files)
|
| 153 |
+
assert response.status_code == 200
|
| 154 |
+
assert "total_files" in response.json()
|
| 155 |
+
|
| 156 |
+
# Pytest test
|
| 157 |
+
@pytest.mark.asyncio
|
| 158 |
+
async def test_chat_endpoint():
|
| 159 |
+
async with httpx.AsyncClient() as client:
|
| 160 |
+
response = await client.post(f"{BASE_URL}/chat",
|
| 161 |
+
json={"message": "test"})
|
| 162 |
+
assert response.status_code == 200
|
| 163 |
+
```
|
| 164 |
+
|
| 165 |
+
## π Adding New Features
|
| 166 |
+
|
| 167 |
+
### Adding a New API Endpoint
|
| 168 |
+
|
| 169 |
+
1. **Define Pydantic Model** (`models.py`):
|
| 170 |
+
```python
|
| 171 |
+
class NewFeatureRequest(BaseModel):
|
| 172 |
+
parameter: str
|
| 173 |
+
optional_param: Optional[int] = None
|
| 174 |
+
|
| 175 |
+
class NewFeatureResponse(BaseModel):
|
| 176 |
+
result: str
|
| 177 |
+
success: bool
|
| 178 |
+
```
|
| 179 |
+
|
| 180 |
+
2. **Create Route Handler** (`routes/new_routes.py`):
|
| 181 |
+
```python
|
| 182 |
+
from fastapi import APIRouter, HTTPException
|
| 183 |
+
from ..models import NewFeatureRequest, NewFeatureResponse
|
| 184 |
+
|
| 185 |
+
router = APIRouter()
|
| 186 |
+
|
| 187 |
+
@router.post("/new-feature", response_model=NewFeatureResponse)
|
| 188 |
+
async def new_feature_endpoint(request: NewFeatureRequest):
|
| 189 |
+
try:
|
| 190 |
+
# Implementation here
|
| 191 |
+
return NewFeatureResponse(result="success", success=True)
|
| 192 |
+
except Exception as e:
|
| 193 |
+
raise HTTPException(status_code=500, detail=str(e))
|
| 194 |
+
```
|
| 195 |
+
|
| 196 |
+
3. **Register Router** (`main.py`):
|
| 197 |
+
```python
|
| 198 |
+
from .routes.new_routes import router as new_router
|
| 199 |
+
app.include_router(new_router)
|
| 200 |
+
```
|
| 201 |
+
|
| 202 |
+
4. **Add Frontend Integration** (`frontend/script.js`):
|
| 203 |
+
```javascript
|
| 204 |
+
async function callNewFeature(data) {
|
| 205 |
+
const response = await fetch('/new-feature', {
|
| 206 |
+
method: 'POST',
|
| 207 |
+
headers: {'Content-Type': 'application/json'},
|
| 208 |
+
body: JSON.stringify(data)
|
| 209 |
+
});
|
| 210 |
+
return response.json();
|
| 211 |
+
}
|
| 212 |
+
```
|
| 213 |
+
|
| 214 |
+
### Extending the RAG Engine
|
| 215 |
+
|
| 216 |
+
To add new document types or processing capabilities:
|
| 217 |
+
|
| 218 |
+
1. **Add File Type Support** (`enhanced_vectordb.py`):
|
| 219 |
+
```python
|
| 220 |
+
def extract_text_from_new_format(self, file_path):
|
| 221 |
+
# Implement extraction logic
|
| 222 |
+
return extracted_text
|
| 223 |
+
|
| 224 |
+
def process_files(self, file_paths):
|
| 225 |
+
for file_path in file_paths:
|
| 226 |
+
if file_path.endswith('.new_format'):
|
| 227 |
+
text = self.extract_text_from_new_format(file_path)
|
| 228 |
+
# Process text...
|
| 229 |
+
```
|
| 230 |
+
|
| 231 |
+
2. **Update Frontend File Acceptance** (`index.html`):
|
| 232 |
+
```html
|
| 233 |
+
<input type="file" accept=".pdf,.txt,.new_format" multiple>
|
| 234 |
+
```
|
| 235 |
+
|
| 236 |
+
## π¨ Frontend Development
|
| 237 |
+
|
| 238 |
+
### Key JavaScript Functions
|
| 239 |
+
- `uploadFiles()` - Handle file uploads with progress
|
| 240 |
+
- `sendMessage()` - Send chat messages and display responses
|
| 241 |
+
- `updateStats()` - Refresh processing statistics
|
| 242 |
+
- `displayCitations()` - Show document sources
|
| 243 |
+
|
| 244 |
+
### CSS Architecture
|
| 245 |
+
- Mobile-first responsive design
|
| 246 |
+
- CSS custom properties for theming
|
| 247 |
+
- Flexbox/Grid layouts
|
| 248 |
+
- Component-based styling
|
| 249 |
+
|
| 250 |
+
### Adding UI Components
|
| 251 |
+
1. Add HTML structure
|
| 252 |
+
2. Style with CSS classes
|
| 253 |
+
3. Add JavaScript event handlers
|
| 254 |
+
4. Connect to backend APIs
|
| 255 |
+
|
| 256 |
+
## π Debugging
|
| 257 |
+
|
| 258 |
+
### Common Issues
|
| 259 |
+
|
| 260 |
+
**CORS Errors**:
|
| 261 |
+
- Check `main.py` CORS configuration
|
| 262 |
+
- Ensure frontend runs on allowed origins
|
| 263 |
+
|
| 264 |
+
**Import Errors**:
|
| 265 |
+
- Verify Python path and virtual environment
|
| 266 |
+
- Check `requirements.txt` dependencies
|
| 267 |
+
|
| 268 |
+
**API Key Issues**:
|
| 269 |
+
- Confirm GROQ API key is set
|
| 270 |
+
- Check environment variable loading
|
| 271 |
+
|
| 272 |
+
### Logging
|
| 273 |
+
|
| 274 |
+
Add logging to your code:
|
| 275 |
+
```python
|
| 276 |
+
import logging
|
| 277 |
+
|
| 278 |
+
logger = logging.getLogger(__name__)
|
| 279 |
+
|
| 280 |
+
@router.post("/endpoint")
|
| 281 |
+
async def endpoint():
|
| 282 |
+
logger.info("Processing request")
|
| 283 |
+
try:
|
| 284 |
+
# Logic here
|
| 285 |
+
logger.debug("Success")
|
| 286 |
+
except Exception as e:
|
| 287 |
+
logger.error(f"Error: {e}")
|
| 288 |
+
raise
|
| 289 |
+
```
|
| 290 |
+
|
| 291 |
+
## π Code Style Guidelines
|
| 292 |
+
|
| 293 |
+
### Python
|
| 294 |
+
- Follow PEP 8
|
| 295 |
+
- Use type hints
|
| 296 |
+
- Add docstrings
|
| 297 |
+
- Maximum line length: 88 characters
|
| 298 |
+
|
| 299 |
+
```python
|
| 300 |
+
def process_document(file_path: str, options: Dict[str, Any]) -> ProcessResult:
|
| 301 |
+
"""
|
| 302 |
+
Process a document and extract text content.
|
| 303 |
+
|
| 304 |
+
Args:
|
| 305 |
+
file_path: Path to the document file
|
| 306 |
+
options: Processing configuration options
|
| 307 |
+
|
| 308 |
+
Returns:
|
| 309 |
+
ProcessResult containing extracted text and metadata
|
| 310 |
+
|
| 311 |
+
Raises:
|
| 312 |
+
ProcessingError: If document cannot be processed
|
| 313 |
+
"""
|
| 314 |
+
# Implementation...
|
| 315 |
+
```
|
| 316 |
+
|
| 317 |
+
### JavaScript
|
| 318 |
+
- Use modern ES6+ syntax
|
| 319 |
+
- Prefer `const`/`let` over `var`
|
| 320 |
+
- Use async/await for promises
|
| 321 |
+
- Add JSDoc comments
|
| 322 |
+
|
| 323 |
+
```javascript
|
| 324 |
+
/**
|
| 325 |
+
* Upload files to the server
|
| 326 |
+
* @param {FileList} files - Files to upload
|
| 327 |
+
* @returns {Promise<Object>} Upload result
|
| 328 |
+
*/
|
| 329 |
+
async function uploadFiles(files) {
|
| 330 |
+
// Implementation...
|
| 331 |
+
}
|
| 332 |
+
```
|
| 333 |
+
|
| 334 |
+
## π Deployment
|
| 335 |
+
|
| 336 |
+
### Development
|
| 337 |
+
```bash
|
| 338 |
+
python -m uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000
|
| 339 |
+
```
|
| 340 |
+
|
| 341 |
+
### Production
|
| 342 |
+
```bash
|
| 343 |
+
python -m uvicorn backend.main:app --host 0.0.0.0 --port 8000 --workers 4
|
| 344 |
+
```
|
| 345 |
+
|
| 346 |
+
### Docker (if configured)
|
| 347 |
+
```bash
|
| 348 |
+
docker build -t rag-chat-app .
|
| 349 |
+
docker run -p 8000:8000 -e GROQ_API_KEY=your_key rag-chat-app
|
| 350 |
+
```
|
| 351 |
+
|
| 352 |
+
## π€ Contributing
|
| 353 |
+
|
| 354 |
+
1. Fork the repository
|
| 355 |
+
2. Create feature branch: `git checkout -b feature/amazing-feature`
|
| 356 |
+
3. Make changes and add tests
|
| 357 |
+
4. Ensure tests pass: `bash tests/run_tests.sh`
|
| 358 |
+
5. Commit: `git commit -m 'Add amazing feature'`
|
| 359 |
+
6. Push: `git push origin feature/amazing-feature`
|
| 360 |
+
7. Open Pull Request
|
| 361 |
+
|
| 362 |
+
### Pull Request Checklist
|
| 363 |
+
- [ ] Code follows style guidelines
|
| 364 |
+
- [ ] Tests added for new functionality
|
| 365 |
+
- [ ] All tests pass
|
| 366 |
+
- [ ] Documentation updated
|
| 367 |
+
- [ ] No breaking changes (or clearly documented)
|
| 368 |
+
|
| 369 |
+
## π Additional Resources
|
| 370 |
+
|
| 371 |
+
- [FastAPI Documentation](https://fastapi.tiangolo.com/)
|
| 372 |
+
- [FAISS Documentation](https://faiss.ai/)
|
| 373 |
+
- [LangChain Documentation](https://python.langchain.com/)
|
| 374 |
+
- [GROQ API Documentation](https://console.groq.com/docs)
|
docs/README.md
ADDED
|
@@ -0,0 +1,169 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# RAG Chat Application - Documentation
|
| 2 |
+
|
| 3 |
+
A sophisticated Retrieval-Augmented Generation (RAG) chat application that enables intelligent conversations with your documents.
|
| 4 |
+
|
| 5 |
+
## ποΈ Architecture Overview
|
| 6 |
+
|
| 7 |
+
```mermaid
|
| 8 |
+
flowchart TD
|
| 9 |
+
%% Client Layer
|
| 10 |
+
subgraph "Client Layer"
|
| 11 |
+
direction TB
|
| 12 |
+
WebClient["Web Client (HTML/JS/CSS)"]:::ui
|
| 13 |
+
StreamlitUI["MVP Streamlit UI"]:::ui
|
| 14 |
+
end
|
| 15 |
+
|
| 16 |
+
%% Backend Layer
|
| 17 |
+
subgraph "Backend Layer"
|
| 18 |
+
direction TB
|
| 19 |
+
FastAPI["FastAPI Backend"]:::api
|
| 20 |
+
subgraph routes["Routes"]
|
| 21 |
+
direction TB
|
| 22 |
+
MainRoutes["main_routes.py"]:::api
|
| 23 |
+
UploadRoutes["upload_routes.py"]:::api
|
| 24 |
+
ChatRoutes["chat_routes.py"]:::api
|
| 25 |
+
StoreRoutes["store_routes.py"]:::api
|
| 26 |
+
end
|
| 27 |
+
Models["models.py"]:::api
|
| 28 |
+
Utils["utils.py"]:::api
|
| 29 |
+
end
|
| 30 |
+
|
| 31 |
+
%% RAG Engine Layer
|
| 32 |
+
subgraph "RAG Engine Layer"
|
| 33 |
+
direction TB
|
| 34 |
+
Config["config.py"]:::core
|
| 35 |
+
CoreEngine["enhanced_vectordb.py"]:::core
|
| 36 |
+
end
|
| 37 |
+
|
| 38 |
+
%% Persistence Layer
|
| 39 |
+
VectorStore[(Vector Store<br/>FAISS Index + Metadata)]:::store
|
| 40 |
+
|
| 41 |
+
%% External Services
|
| 42 |
+
subgraph "External Services"
|
| 43 |
+
direction TB
|
| 44 |
+
GROQ["GROQ Vision API"]:::external
|
| 45 |
+
SentenceModel["SentenceTransformer Model"]:::external
|
| 46 |
+
end
|
| 47 |
+
|
| 48 |
+
%% Tests
|
| 49 |
+
subgraph "Automated Tests"
|
| 50 |
+
direction TB
|
| 51 |
+
Tests1["test_api_endpoints.py"]:::tests
|
| 52 |
+
Tests2["test_endpoints_pytest.py"]:::tests
|
| 53 |
+
end
|
| 54 |
+
|
| 55 |
+
%% Connections
|
| 56 |
+
WebClient -->|"/api/*" fetch| FastAPI
|
| 57 |
+
MainRoutes -->|serve static| WebClient
|
| 58 |
+
StreamlitUI -->|in-process calls| CoreEngine
|
| 59 |
+
FastAPI -->|calls RAG Engine| CoreEngine
|
| 60 |
+
CoreEngine -->|read/write| VectorStore
|
| 61 |
+
CoreEngine -->|OCR & LLM requests| GROQ
|
| 62 |
+
CoreEngine -->|embedding requests| SentenceModel
|
| 63 |
+
StoreRoutes -->|disk read/write| VectorStore
|
| 64 |
+
|
| 65 |
+
%% Click Events
|
| 66 |
+
click WebClient "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/frontend/index.html"
|
| 67 |
+
click WebClient "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/frontend/script.js"
|
| 68 |
+
click WebClient "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/frontend/style.css"
|
| 69 |
+
click StreamlitUI "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/streamlit_rag_app.py"
|
| 70 |
+
click FastAPI "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/backend/main.py"
|
| 71 |
+
click Models "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/backend/models.py"
|
| 72 |
+
click Utils "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/backend/utils.py"
|
| 73 |
+
click MainRoutes "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/backend/routes/main_routes.py"
|
| 74 |
+
click UploadRoutes "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/backend/routes/upload_routes.py"
|
| 75 |
+
click ChatRoutes "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/backend/routes/chat_routes.py"
|
| 76 |
+
click StoreRoutes "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/backend/routes/store_routes.py"
|
| 77 |
+
click Config "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/rag_elements/config.py"
|
| 78 |
+
click CoreEngine "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/rag_elements/enhanced_vectordb.py"
|
| 79 |
+
click Tests1 "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/test/test_api_endpoints.py"
|
| 80 |
+
click Tests2 "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/test/test_endpoints_pytest.py"
|
| 81 |
+
|
| 82 |
+
%% Styles
|
| 83 |
+
classDef ui fill:#E3F2FD,stroke:#1976D2,color:#0D47A1;
|
| 84 |
+
classDef api fill:#E8F5E9,stroke:#388E3C,color:#1B5E20;
|
| 85 |
+
classDef core fill:#FFF3E0,stroke:#FB8C00,color:#E65100;
|
| 86 |
+
classDef store fill:#FFF9C4,stroke:#FBC02D,color:#F57F17;
|
| 87 |
+
classDef external fill:#ECEFF1,stroke:#607D8B,color:#37474F;
|
| 88 |
+
classDef tests fill:#F3E5F5,stroke:#8E24AA,color:#4A148C;
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
## π Quick Start
|
| 92 |
+
|
| 93 |
+
### Prerequisites
|
| 94 |
+
- Python 3.8+
|
| 95 |
+
- GROQ API key (for OCR and chat)
|
| 96 |
+
|
| 97 |
+
### Installation & Running
|
| 98 |
+
```bash
|
| 99 |
+
# Clone repository
|
| 100 |
+
git clone https://github.com/Jatin-Mehra119/wasserstoff-AiInternTask.git
|
| 101 |
+
cd wasserstoff-AiInternTask
|
| 102 |
+
|
| 103 |
+
# Install dependencies
|
| 104 |
+
pip install -r requirements.txt
|
| 105 |
+
|
| 106 |
+
# Run FastAPI backend (Production)
|
| 107 |
+
cd backend
|
| 108 |
+
python -m uvicorn main:app --host 0.0.0.0 --port 8000 --reload
|
| 109 |
+
|
| 110 |
+
# Open http://localhost:8000 in browser
|
| 111 |
+
|
| 112 |
+
# Alternative: Run Streamlit MVP
|
| 113 |
+
streamlit run streamlit_rag_app.py
|
| 114 |
+
```
|
| 115 |
+
|
| 116 |
+
## π§ Architecture Components
|
| 117 |
+
|
| 118 |
+
### Core RAG Engine (`rag_elements/`)
|
| 119 |
+
- **`enhanced_vectordb.py`** - Main RAG implementation with document processing, vector search, and AI integration
|
| 120 |
+
- **`config.py`** - Configuration management and settings
|
| 121 |
+
|
| 122 |
+
### FastAPI Backend (`backend/`)
|
| 123 |
+
- **`main.py`** - Application entry point and server configuration
|
| 124 |
+
- **`models.py`** - Pydantic data models and API schemas
|
| 125 |
+
- **`utils.py`** - Utilities, state management, and helpers
|
| 126 |
+
- **`routes/`** - Modular API endpoints:
|
| 127 |
+
- `main_routes.py` - Frontend serving and health
|
| 128 |
+
- `upload_routes.py` - Document upload and processing
|
| 129 |
+
- `chat_routes.py` - Chat interface and AI responses
|
| 130 |
+
- `store_routes.py` - Vector store persistence
|
| 131 |
+
|
| 132 |
+
### Frontend (`frontend/`)
|
| 133 |
+
- **`index.html`** - Main application UI
|
| 134 |
+
- **`style.css`** - Responsive design and styling
|
| 135 |
+
- **`script.js`** - Frontend logic and API integration
|
| 136 |
+
|
| 137 |
+
### Legacy MVP
|
| 138 |
+
- **`streamlit_rag_app.py`** - Original Streamlit implementation
|
| 139 |
+
|
| 140 |
+
## π Data Flow
|
| 141 |
+
|
| 142 |
+
1. **Document Upload** β Text extraction β Chunking β Vector embeddings β FAISS index
|
| 143 |
+
2. **Chat Query** β Semantic search β Context retrieval β AI response generation β Citations
|
| 144 |
+
3. **Persistence** β Save/load vector stores with metadata
|
| 145 |
+
|
| 146 |
+
## π Key APIs
|
| 147 |
+
|
| 148 |
+
- `POST /upload-files` - Process documents
|
| 149 |
+
- `POST /chat` - Chat with documents
|
| 150 |
+
- `GET /stats` - Processing statistics
|
| 151 |
+
- `POST /save-vector-store` - Persist data
|
| 152 |
+
- `POST /load-vector-store` - Restore data
|
| 153 |
+
|
| 154 |
+
## π§ͺ Testing
|
| 155 |
+
|
| 156 |
+
```bash
|
| 157 |
+
cd tests
|
| 158 |
+
bash run_tests.sh
|
| 159 |
+
```
|
| 160 |
+
|
| 161 |
+
## π External Dependencies
|
| 162 |
+
|
| 163 |
+
- **FAISS** - Vector similarity search
|
| 164 |
+
- **GROQ** - Vision OCR and conversational AI
|
| 165 |
+
- **LangChain** - Document processing
|
| 166 |
+
- **FastAPI** - Web framework
|
| 167 |
+
- **Sentence Transformers** - Text embeddings
|
| 168 |
+
|
| 169 |
+
For detailed information, see the main [README.md](../README.md).
|
docs/index.md
ADDED
|
@@ -0,0 +1,86 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Documentation Index
|
| 2 |
+
|
| 3 |
+
Welcome to the RAG Chat Application documentation! This directory contains comprehensive guides to help you understand, use, and contribute to the project.
|
| 4 |
+
|
| 5 |
+
## π Documentation Structure
|
| 6 |
+
|
| 7 |
+
### Quick Start & Overview
|
| 8 |
+
- **[README.md](README.md)** - Project overview, architecture diagram, and quick start guide
|
| 9 |
+
- **[Main README](../README.md)** - Comprehensive project documentation with detailed features and usage
|
| 10 |
+
|
| 11 |
+
### API Reference
|
| 12 |
+
- **[API.md](API.md)** - Complete REST API documentation with examples and curl commands
|
| 13 |
+
|
| 14 |
+
### Development
|
| 15 |
+
- **[DEVELOPMENT.md](DEVELOPMENT.md)** - Developer guide for contributing to the project
|
| 16 |
+
|
| 17 |
+
## π― Getting Started
|
| 18 |
+
|
| 19 |
+
### For Users
|
| 20 |
+
1. Read the [Quick Start](README.md#-quick-start) section
|
| 21 |
+
2. Follow the [Installation & Running](README.md#installation--running) instructions
|
| 22 |
+
3. Review the [API Reference](API.md) for integration details
|
| 23 |
+
|
| 24 |
+
### For Developers
|
| 25 |
+
1. Start with the [Development Setup](DEVELOPMENT.md#-development-setup)
|
| 26 |
+
2. Understand the [Project Structure](DEVELOPMENT.md#-project-structure)
|
| 27 |
+
3. Review [Core Components](DEVELOPMENT.md#-core-components)
|
| 28 |
+
4. Check the [Contributing Guidelines](DEVELOPMENT.md#-contributing)
|
| 29 |
+
|
| 30 |
+
## ποΈ Architecture Quick Reference
|
| 31 |
+
|
| 32 |
+
The application follows a layered architecture:
|
| 33 |
+
|
| 34 |
+
- **Client Layer**: Web frontend + Streamlit MVP
|
| 35 |
+
- **Backend Layer**: FastAPI with modular routes
|
| 36 |
+
- **RAG Engine Layer**: Core document processing and vector search
|
| 37 |
+
- **Persistence Layer**: FAISS vector store with metadata
|
| 38 |
+
- **External Services**: GROQ API and Sentence Transformers
|
| 39 |
+
|
| 40 |
+
See the [architecture diagram](README.md#οΈ-architecture-overview) for visual representation.
|
| 41 |
+
|
| 42 |
+
## π Quick Links
|
| 43 |
+
|
| 44 |
+
| Topic | Document | Description |
|
| 45 |
+
|-------|----------|-------------|
|
| 46 |
+
| **Overview** | [README.md](README.md) | Architecture and quick start |
|
| 47 |
+
| **API Endpoints** | [API.md](API.md) | REST API reference |
|
| 48 |
+
| **Development** | [DEVELOPMENT.md](DEVELOPMENT.md) | Contributing guidelines |
|
| 49 |
+
| **Main README** | [../README.md](../README.md) | Detailed project documentation |
|
| 50 |
+
| **Tests** | [../tests/README.md](../tests/README.md) | Testing documentation |
|
| 51 |
+
|
| 52 |
+
## π Core Features
|
| 53 |
+
|
| 54 |
+
- **Multi-format Document Processing**: PDF, text, images, code files
|
| 55 |
+
- **Intelligent Chat Interface**: AI-powered responses with citations
|
| 56 |
+
- **Vector Search**: FAISS-powered semantic similarity search
|
| 57 |
+
- **Persistence**: Save and load processed document collections
|
| 58 |
+
- **Modern Web UI**: Responsive design with real-time updates
|
| 59 |
+
- **Comprehensive API**: RESTful endpoints with interactive documentation
|
| 60 |
+
|
| 61 |
+
## π οΈ Tech Stack
|
| 62 |
+
|
| 63 |
+
- **Backend**: FastAPI, Python 3.8+
|
| 64 |
+
- **Frontend**: HTML5, CSS3, JavaScript (ES6+)
|
| 65 |
+
- **AI/ML**: GROQ API, Sentence Transformers, LangChain
|
| 66 |
+
- **Search**: FAISS vector database
|
| 67 |
+
- **Testing**: pytest, requests
|
| 68 |
+
|
| 69 |
+
## π Support
|
| 70 |
+
|
| 71 |
+
- **Issues**: [GitHub Issues](https://github.com/Jatin-Mehra119/wasserstoff-AiInternTask/issues)
|
| 72 |
+
- **Discussions**: [GitHub Discussions](https://github.com/Jatin-Mehra119/wasserstoff-AiInternTask/discussions)
|
| 73 |
+
- **API Docs**: http://localhost:8000/docs (when server is running)
|
| 74 |
+
|
| 75 |
+
## π Contributing
|
| 76 |
+
|
| 77 |
+
We welcome contributions! Please read the [Development Guide](DEVELOPMENT.md#-contributing) for guidelines on:
|
| 78 |
+
|
| 79 |
+
- Code style and standards
|
| 80 |
+
- Testing requirements
|
| 81 |
+
- Pull request process
|
| 82 |
+
- Adding new features
|
| 83 |
+
|
| 84 |
+
---
|
| 85 |
+
|
| 86 |
+
*This documentation is maintained alongside the codebase. For the most up-to-date information, always refer to the latest version in the repository.*
|