Jatin Mehra commited on
Commit
535ca47
Β·
1 Parent(s): f255569

Add comprehensive documentation including API reference, development guide, and index

Browse files
Files changed (5) hide show
  1. README.md +14 -0
  2. docs/API.md +231 -0
  3. docs/DEVELOPMENT.md +374 -0
  4. docs/README.md +169 -0
  5. docs/index.md +86 -0
README.md CHANGED
@@ -578,6 +578,20 @@ docker-compose up -d
578
  - **🌐 WebSocket Support**: Real-time chat updates and live document processing
579
  - **🧠 Model Upgrades**: Integration with latest embedding and LLM models
580
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
581
  ## πŸ“„ License
582
 
583
  This project is licensed under the **Apache License 2.0** - see the [LICENSE](LICENSE) file for complete details.
 
578
  - **🌐 WebSocket Support**: Real-time chat updates and live document processing
579
  - **🧠 Model Upgrades**: Integration with latest embedding and LLM models
580
 
581
+ ## πŸ“š Documentation
582
+
583
+ Comprehensive documentation is available in the `docs/` directory:
584
+
585
+ - **[πŸ“– Documentation Index](docs/index.md)** - Complete documentation overview
586
+ - **[πŸ—οΈ Architecture & Quick Start](docs/README.md)** - Project architecture with mermaid diagram
587
+ - **[πŸ”Œ API Reference](docs/API.md)** - REST API endpoints and examples
588
+ - **[πŸ’» Development Guide](docs/DEVELOPMENT.md)** - Contributing and development setup
589
+
590
+ ### Interactive API Documentation
591
+ When the server is running, visit:
592
+ - **Swagger UI**: http://localhost:8000/docs
593
+ - **ReDoc**: http://localhost:8000/redoc
594
+
595
  ## πŸ“„ License
596
 
597
  This project is licensed under the **Apache License 2.0** - see the [LICENSE](LICENSE) file for complete details.
docs/API.md ADDED
@@ -0,0 +1,231 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # API Documentation
2
+
3
+ This document provides a quick reference for the RAG Chat Application REST API endpoints.
4
+
5
+ ## Base URL
6
+ ```
7
+ http://localhost:8000
8
+ ```
9
+
10
+ ## Authentication
11
+ Most endpoints require a GROQ API key to be configured:
12
+
13
+ ```bash
14
+ POST /set-api-key
15
+ Content-Type: application/json
16
+
17
+ {
18
+ "api_key": "your_groq_api_key_here"
19
+ }
20
+ ```
21
+
22
+ ## Core Endpoints
23
+
24
+ ### Document Processing
25
+
26
+ #### Upload Files
27
+ ```bash
28
+ POST /upload-files
29
+ Content-Type: multipart/form-data
30
+
31
+ # Form data with file uploads
32
+ files: [file1.pdf, file2.txt, ...]
33
+ ```
34
+
35
+ **Response:**
36
+ ```json
37
+ {
38
+ "total_files": 5,
39
+ "total_documents": 12,
40
+ "total_chunks": 87,
41
+ "file_types": ["pdf", "txt", "py"],
42
+ "type_counts": {"pdf": 3, "txt": 1, "py": 1}
43
+ }
44
+ ```
45
+
46
+ #### Process Directory
47
+ ```bash
48
+ POST /process-directory
49
+ Content-Type: application/x-www-form-urlencoded
50
+
51
+ directory_path=/path/to/documents
52
+ ```
53
+
54
+ ### Chat Interface
55
+
56
+ #### Send Chat Message
57
+ ```bash
58
+ POST /chat
59
+ Content-Type: application/json
60
+
61
+ {
62
+ "message": "What is the main topic of the documents?"
63
+ }
64
+ ```
65
+
66
+ **Response:**
67
+ ```json
68
+ {
69
+ "response": "Based on the documents, the main topics include...",
70
+ "citations": [
71
+ {
72
+ "content": "relevant excerpt from document",
73
+ "citation": "/path/to/source/file.pdf",
74
+ "type": "pdf",
75
+ "score": 0.85
76
+ }
77
+ ],
78
+ "themes": {
79
+ "key_themes": ["AI", "Machine Learning", "RAG"],
80
+ "analysis": "The documents focus on AI and ML concepts..."
81
+ },
82
+ "timestamp": "2025-06-11T10:30:00.123456"
83
+ }
84
+ ```
85
+
86
+ ### Data Management
87
+
88
+ #### Get Statistics
89
+ ```bash
90
+ GET /stats
91
+ ```
92
+
93
+ **Response:**
94
+ ```json
95
+ {
96
+ "total_files": 10,
97
+ "total_documents": 25,
98
+ "total_chunks": 150,
99
+ "file_types": ["pdf", "txt", "py", "md"],
100
+ "type_counts": {"pdf": 5, "txt": 3, "py": 1, "md": 1},
101
+ "processed_at": "2025-06-11 10:30:00"
102
+ }
103
+ ```
104
+
105
+ #### Get Chat History
106
+ ```bash
107
+ GET /chat-history
108
+ ```
109
+
110
+ **Response:**
111
+ ```json
112
+ [
113
+ {
114
+ "user_message": "What is RAG?",
115
+ "assistant_response": "RAG stands for Retrieval-Augmented Generation...",
116
+ "timestamp": "2025-06-11T10:30:00.123456",
117
+ "citations": [...]
118
+ }
119
+ ]
120
+ ```
121
+
122
+ #### Clear Chat History
123
+ ```bash
124
+ DELETE /clear-chat
125
+ ```
126
+
127
+ ### Vector Store Management
128
+
129
+ #### Save Vector Store
130
+ ```bash
131
+ POST /save-vector-store
132
+ ```
133
+
134
+ **Response:**
135
+ ```json
136
+ {
137
+ "message": "Vector store saved successfully"
138
+ }
139
+ ```
140
+
141
+ #### Load Vector Store
142
+ ```bash
143
+ POST /load-vector-store
144
+ ```
145
+
146
+ **Response:**
147
+ ```json
148
+ {
149
+ "message": "Vector store loaded successfully",
150
+ "stats": {
151
+ "total_files": 10,
152
+ "total_documents": 25,
153
+ "total_chunks": 150
154
+ }
155
+ }
156
+ ```
157
+
158
+ ## Frontend Serving
159
+
160
+ #### Main Application
161
+ ```bash
162
+ GET /
163
+ ```
164
+ Returns the HTML frontend application.
165
+
166
+ ## Error Responses
167
+
168
+ All endpoints return errors in this format:
169
+ ```json
170
+ {
171
+ "detail": "Error description message"
172
+ }
173
+ ```
174
+
175
+ Common HTTP status codes:
176
+ - `200` - Success
177
+ - `400` - Bad Request (invalid input)
178
+ - `422` - Validation Error
179
+ - `500` - Internal Server Error
180
+
181
+ ## Interactive Documentation
182
+
183
+ When the server is running, visit:
184
+ - **Swagger UI**: http://localhost:8000/docs
185
+ - **ReDoc**: http://localhost:8000/redoc
186
+
187
+ ## Examples
188
+
189
+ ### Complete Workflow
190
+ ```bash
191
+ # 1. Set API key
192
+ curl -X POST "http://localhost:8000/set-api-key" \
193
+ -H "Content-Type: application/json" \
194
+ -d '{"api_key": "your_groq_key"}'
195
+
196
+ # 2. Upload files
197
+ curl -X POST "http://localhost:8000/upload-files" \
198
+ -F "files=@document1.pdf" \
199
+ -F "files=@document2.txt"
200
+
201
+ # 3. Chat with documents
202
+ curl -X POST "http://localhost:8000/chat" \
203
+ -H "Content-Type: application/json" \
204
+ -d '{"message": "Summarize the key points"}'
205
+
206
+ # 4. Get statistics
207
+ curl -X GET "http://localhost:8000/stats"
208
+
209
+ # 5. Save vector store
210
+ curl -X POST "http://localhost:8000/save-vector-store"
211
+ ```
212
+
213
+ ### Python Client Example
214
+ ```python
215
+ import requests
216
+
217
+ base_url = "http://localhost:8000"
218
+
219
+ # Set API key
220
+ response = requests.post(f"{base_url}/set-api-key",
221
+ json={"api_key": "your_groq_key"})
222
+
223
+ # Upload files
224
+ files = {'files': open('document.pdf', 'rb')}
225
+ response = requests.post(f"{base_url}/upload-files", files=files)
226
+
227
+ # Chat
228
+ response = requests.post(f"{base_url}/chat",
229
+ json={"message": "What is this document about?"})
230
+ print(response.json())
231
+ ```
docs/DEVELOPMENT.md ADDED
@@ -0,0 +1,374 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Development Guide
2
+
3
+ This guide helps developers understand the codebase and contribute to the RAG Chat Application.
4
+
5
+ ## πŸ—οΈ Project Structure
6
+
7
+ ```
8
+ wasserstoff-AiInternTask/
9
+ β”œβ”€β”€ rag_elements/ # 🧠 Core RAG Engine
10
+ β”‚ β”œβ”€β”€ enhanced_vectordb.py # Main RAG implementation
11
+ β”‚ └── config.py # Configuration management
12
+ β”œβ”€β”€ backend/ # πŸš€ FastAPI Production Server
13
+ β”‚ β”œβ”€β”€ main.py # App entry point
14
+ β”‚ β”œβ”€β”€ models.py # Pydantic schemas
15
+ β”‚ β”œβ”€β”€ utils.py # Utilities and state
16
+ β”‚ └── routes/ # API endpoints
17
+ β”œβ”€β”€ frontend/ # 🎨 Web Interface
18
+ β”‚ β”œβ”€β”€ index.html # Main UI
19
+ β”‚ β”œβ”€β”€ style.css # Styling
20
+ β”‚ └── script.js # Frontend logic
21
+ β”œβ”€β”€ tests/ # πŸ§ͺ Test Suite
22
+ └── docs/ # πŸ“š Documentation
23
+ ```
24
+
25
+ ## πŸ”§ Development Setup
26
+
27
+ ### Prerequisites
28
+ - Python 3.8+
29
+ - Git
30
+ - Text editor/IDE (VS Code recommended)
31
+
32
+ ### Environment Setup
33
+ ```bash
34
+ # Clone repository
35
+ git clone https://github.com/Jatin-Mehra119/wasserstoff-AiInternTask.git
36
+ cd wasserstoff-AiInternTask
37
+
38
+ # Create virtual environment (recommended)
39
+ python -m venv venv
40
+ source venv/bin/activate # Linux/macOS
41
+ # or venv\Scripts\activate # Windows
42
+
43
+ # Install dependencies
44
+ pip install -r requirements.txt
45
+
46
+ # Install development dependencies
47
+ pip install -r tests/requirements-test.txt
48
+
49
+ # Set up environment variables
50
+ cp .env.example .env # Create if exists
51
+ # Add your GROQ_API_KEY to .env
52
+ ```
53
+
54
+ ### Running in Development Mode
55
+ ```bash
56
+ # Start FastAPI with hot reload
57
+ cd backend
58
+ python -m uvicorn main:app --reload --host 0.0.0.0 --port 8000
59
+
60
+ # Or run Streamlit version
61
+ streamlit run streamlit_rag_app.py
62
+ ```
63
+
64
+ ## 🧱 Core Components
65
+
66
+ ### 1. RAG Engine (`rag_elements/enhanced_vectordb.py`)
67
+
68
+ The heart of the application. Key classes and methods:
69
+
70
+ ```python
71
+ class EnhancedDocumentProcessor:
72
+ def process_files(self, file_paths) # Multi-format processing
73
+ def create_enhanced_vector_store(self, documents) # FAISS index creation
74
+ def search_with_citations(self, query, k=5) # Semantic search
75
+ def get_chat_response(self, query) # End-to-end chat
76
+ def save_vector_store(self, path) # Persistence
77
+ def load_vector_store(self, path) # Restore data
78
+ ```
79
+
80
+ ### 2. FastAPI Backend (`backend/`)
81
+
82
+ **Entry Point (`main.py`)**:
83
+ - FastAPI app initialization
84
+ - CORS configuration
85
+ - Route registration
86
+
87
+ **Data Models (`models.py`)**:
88
+ - Pydantic schemas for API requests/responses
89
+ - Type validation and serialization
90
+
91
+ **Routes (`routes/`)**:
92
+ - `main_routes.py` - Frontend serving, health checks
93
+ - `upload_routes.py` - File upload and processing
94
+ - `chat_routes.py` - Chat interface and AI responses
95
+ - `store_routes.py` - Vector store management
96
+
97
+ **Utilities (`utils.py`)**:
98
+ - Global state management
99
+ - Helper functions
100
+ - Error handling utilities
101
+
102
+ ### 3. Frontend (`frontend/`)
103
+
104
+ Modern web interface with:
105
+ - **HTML**: Semantic structure with responsive layout
106
+ - **CSS**: Modern styling with CSS Grid/Flexbox
107
+ - **JavaScript**: Async API calls, real-time updates, file handling
108
+
109
+ ## πŸ”„ Data Flow
110
+
111
+ ### Document Processing Pipeline
112
+ 1. **File Upload** β†’ `upload_routes.py`
113
+ 2. **Text Extraction** β†’ `enhanced_vectordb.py`
114
+ 3. **Chunking** β†’ LangChain text splitters
115
+ 4. **Embeddings** β†’ Sentence Transformers
116
+ 5. **Indexing** β†’ FAISS vector store
117
+ 6. **Metadata Storage** β†’ JSON persistence
118
+
119
+ ### Chat Pipeline
120
+ 1. **User Query** β†’ `chat_routes.py`
121
+ 2. **Semantic Search** β†’ FAISS similarity search
122
+ 3. **Context Retrieval** β†’ Top-K document chunks
123
+ 4. **AI Response** β†’ GROQ API integration
124
+ 5. **Citation Generation** β†’ Source attribution
125
+ 6. **Response Formatting** β†’ Markdown output
126
+
127
+ ## πŸ§ͺ Testing
128
+
129
+ ### Running Tests
130
+ ```bash
131
+ cd tests
132
+
133
+ # Run all tests
134
+ bash run_tests.sh
135
+
136
+ # Run specific test files
137
+ python -m pytest test_endpoints_pytest.py -v
138
+ python test_api_endpoints.py
139
+ ```
140
+
141
+ ### Test Structure
142
+ - `test_api_endpoints.py` - Basic API endpoint testing
143
+ - `test_endpoints_pytest.py` - Comprehensive pytest suite
144
+ - `run_tests.sh` - Test runner script
145
+
146
+ ### Writing Tests
147
+ Follow these patterns:
148
+
149
+ ```python
150
+ # API endpoint test
151
+ def test_upload_endpoint():
152
+ response = requests.post(f"{BASE_URL}/upload-files", files=files)
153
+ assert response.status_code == 200
154
+ assert "total_files" in response.json()
155
+
156
+ # Pytest test
157
+ @pytest.mark.asyncio
158
+ async def test_chat_endpoint():
159
+ async with httpx.AsyncClient() as client:
160
+ response = await client.post(f"{BASE_URL}/chat",
161
+ json={"message": "test"})
162
+ assert response.status_code == 200
163
+ ```
164
+
165
+ ## πŸ”Œ Adding New Features
166
+
167
+ ### Adding a New API Endpoint
168
+
169
+ 1. **Define Pydantic Model** (`models.py`):
170
+ ```python
171
+ class NewFeatureRequest(BaseModel):
172
+ parameter: str
173
+ optional_param: Optional[int] = None
174
+
175
+ class NewFeatureResponse(BaseModel):
176
+ result: str
177
+ success: bool
178
+ ```
179
+
180
+ 2. **Create Route Handler** (`routes/new_routes.py`):
181
+ ```python
182
+ from fastapi import APIRouter, HTTPException
183
+ from ..models import NewFeatureRequest, NewFeatureResponse
184
+
185
+ router = APIRouter()
186
+
187
+ @router.post("/new-feature", response_model=NewFeatureResponse)
188
+ async def new_feature_endpoint(request: NewFeatureRequest):
189
+ try:
190
+ # Implementation here
191
+ return NewFeatureResponse(result="success", success=True)
192
+ except Exception as e:
193
+ raise HTTPException(status_code=500, detail=str(e))
194
+ ```
195
+
196
+ 3. **Register Router** (`main.py`):
197
+ ```python
198
+ from .routes.new_routes import router as new_router
199
+ app.include_router(new_router)
200
+ ```
201
+
202
+ 4. **Add Frontend Integration** (`frontend/script.js`):
203
+ ```javascript
204
+ async function callNewFeature(data) {
205
+ const response = await fetch('/new-feature', {
206
+ method: 'POST',
207
+ headers: {'Content-Type': 'application/json'},
208
+ body: JSON.stringify(data)
209
+ });
210
+ return response.json();
211
+ }
212
+ ```
213
+
214
+ ### Extending the RAG Engine
215
+
216
+ To add new document types or processing capabilities:
217
+
218
+ 1. **Add File Type Support** (`enhanced_vectordb.py`):
219
+ ```python
220
+ def extract_text_from_new_format(self, file_path):
221
+ # Implement extraction logic
222
+ return extracted_text
223
+
224
+ def process_files(self, file_paths):
225
+ for file_path in file_paths:
226
+ if file_path.endswith('.new_format'):
227
+ text = self.extract_text_from_new_format(file_path)
228
+ # Process text...
229
+ ```
230
+
231
+ 2. **Update Frontend File Acceptance** (`index.html`):
232
+ ```html
233
+ <input type="file" accept=".pdf,.txt,.new_format" multiple>
234
+ ```
235
+
236
+ ## 🎨 Frontend Development
237
+
238
+ ### Key JavaScript Functions
239
+ - `uploadFiles()` - Handle file uploads with progress
240
+ - `sendMessage()` - Send chat messages and display responses
241
+ - `updateStats()` - Refresh processing statistics
242
+ - `displayCitations()` - Show document sources
243
+
244
+ ### CSS Architecture
245
+ - Mobile-first responsive design
246
+ - CSS custom properties for theming
247
+ - Flexbox/Grid layouts
248
+ - Component-based styling
249
+
250
+ ### Adding UI Components
251
+ 1. Add HTML structure
252
+ 2. Style with CSS classes
253
+ 3. Add JavaScript event handlers
254
+ 4. Connect to backend APIs
255
+
256
+ ## πŸ› Debugging
257
+
258
+ ### Common Issues
259
+
260
+ **CORS Errors**:
261
+ - Check `main.py` CORS configuration
262
+ - Ensure frontend runs on allowed origins
263
+
264
+ **Import Errors**:
265
+ - Verify Python path and virtual environment
266
+ - Check `requirements.txt` dependencies
267
+
268
+ **API Key Issues**:
269
+ - Confirm GROQ API key is set
270
+ - Check environment variable loading
271
+
272
+ ### Logging
273
+
274
+ Add logging to your code:
275
+ ```python
276
+ import logging
277
+
278
+ logger = logging.getLogger(__name__)
279
+
280
+ @router.post("/endpoint")
281
+ async def endpoint():
282
+ logger.info("Processing request")
283
+ try:
284
+ # Logic here
285
+ logger.debug("Success")
286
+ except Exception as e:
287
+ logger.error(f"Error: {e}")
288
+ raise
289
+ ```
290
+
291
+ ## πŸ“ Code Style Guidelines
292
+
293
+ ### Python
294
+ - Follow PEP 8
295
+ - Use type hints
296
+ - Add docstrings
297
+ - Maximum line length: 88 characters
298
+
299
+ ```python
300
+ def process_document(file_path: str, options: Dict[str, Any]) -> ProcessResult:
301
+ """
302
+ Process a document and extract text content.
303
+
304
+ Args:
305
+ file_path: Path to the document file
306
+ options: Processing configuration options
307
+
308
+ Returns:
309
+ ProcessResult containing extracted text and metadata
310
+
311
+ Raises:
312
+ ProcessingError: If document cannot be processed
313
+ """
314
+ # Implementation...
315
+ ```
316
+
317
+ ### JavaScript
318
+ - Use modern ES6+ syntax
319
+ - Prefer `const`/`let` over `var`
320
+ - Use async/await for promises
321
+ - Add JSDoc comments
322
+
323
+ ```javascript
324
+ /**
325
+ * Upload files to the server
326
+ * @param {FileList} files - Files to upload
327
+ * @returns {Promise<Object>} Upload result
328
+ */
329
+ async function uploadFiles(files) {
330
+ // Implementation...
331
+ }
332
+ ```
333
+
334
+ ## πŸš€ Deployment
335
+
336
+ ### Development
337
+ ```bash
338
+ python -m uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000
339
+ ```
340
+
341
+ ### Production
342
+ ```bash
343
+ python -m uvicorn backend.main:app --host 0.0.0.0 --port 8000 --workers 4
344
+ ```
345
+
346
+ ### Docker (if configured)
347
+ ```bash
348
+ docker build -t rag-chat-app .
349
+ docker run -p 8000:8000 -e GROQ_API_KEY=your_key rag-chat-app
350
+ ```
351
+
352
+ ## 🀝 Contributing
353
+
354
+ 1. Fork the repository
355
+ 2. Create feature branch: `git checkout -b feature/amazing-feature`
356
+ 3. Make changes and add tests
357
+ 4. Ensure tests pass: `bash tests/run_tests.sh`
358
+ 5. Commit: `git commit -m 'Add amazing feature'`
359
+ 6. Push: `git push origin feature/amazing-feature`
360
+ 7. Open Pull Request
361
+
362
+ ### Pull Request Checklist
363
+ - [ ] Code follows style guidelines
364
+ - [ ] Tests added for new functionality
365
+ - [ ] All tests pass
366
+ - [ ] Documentation updated
367
+ - [ ] No breaking changes (or clearly documented)
368
+
369
+ ## πŸ“š Additional Resources
370
+
371
+ - [FastAPI Documentation](https://fastapi.tiangolo.com/)
372
+ - [FAISS Documentation](https://faiss.ai/)
373
+ - [LangChain Documentation](https://python.langchain.com/)
374
+ - [GROQ API Documentation](https://console.groq.com/docs)
docs/README.md ADDED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # RAG Chat Application - Documentation
2
+
3
+ A sophisticated Retrieval-Augmented Generation (RAG) chat application that enables intelligent conversations with your documents.
4
+
5
+ ## πŸ—οΈ Architecture Overview
6
+
7
+ ```mermaid
8
+ flowchart TD
9
+ %% Client Layer
10
+ subgraph "Client Layer"
11
+ direction TB
12
+ WebClient["Web Client (HTML/JS/CSS)"]:::ui
13
+ StreamlitUI["MVP Streamlit UI"]:::ui
14
+ end
15
+
16
+ %% Backend Layer
17
+ subgraph "Backend Layer"
18
+ direction TB
19
+ FastAPI["FastAPI Backend"]:::api
20
+ subgraph routes["Routes"]
21
+ direction TB
22
+ MainRoutes["main_routes.py"]:::api
23
+ UploadRoutes["upload_routes.py"]:::api
24
+ ChatRoutes["chat_routes.py"]:::api
25
+ StoreRoutes["store_routes.py"]:::api
26
+ end
27
+ Models["models.py"]:::api
28
+ Utils["utils.py"]:::api
29
+ end
30
+
31
+ %% RAG Engine Layer
32
+ subgraph "RAG Engine Layer"
33
+ direction TB
34
+ Config["config.py"]:::core
35
+ CoreEngine["enhanced_vectordb.py"]:::core
36
+ end
37
+
38
+ %% Persistence Layer
39
+ VectorStore[(Vector Store<br/>FAISS Index + Metadata)]:::store
40
+
41
+ %% External Services
42
+ subgraph "External Services"
43
+ direction TB
44
+ GROQ["GROQ Vision API"]:::external
45
+ SentenceModel["SentenceTransformer Model"]:::external
46
+ end
47
+
48
+ %% Tests
49
+ subgraph "Automated Tests"
50
+ direction TB
51
+ Tests1["test_api_endpoints.py"]:::tests
52
+ Tests2["test_endpoints_pytest.py"]:::tests
53
+ end
54
+
55
+ %% Connections
56
+ WebClient -->|"/api/*" fetch| FastAPI
57
+ MainRoutes -->|serve static| WebClient
58
+ StreamlitUI -->|in-process calls| CoreEngine
59
+ FastAPI -->|calls RAG Engine| CoreEngine
60
+ CoreEngine -->|read/write| VectorStore
61
+ CoreEngine -->|OCR & LLM requests| GROQ
62
+ CoreEngine -->|embedding requests| SentenceModel
63
+ StoreRoutes -->|disk read/write| VectorStore
64
+
65
+ %% Click Events
66
+ click WebClient "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/frontend/index.html"
67
+ click WebClient "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/frontend/script.js"
68
+ click WebClient "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/frontend/style.css"
69
+ click StreamlitUI "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/streamlit_rag_app.py"
70
+ click FastAPI "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/backend/main.py"
71
+ click Models "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/backend/models.py"
72
+ click Utils "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/backend/utils.py"
73
+ click MainRoutes "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/backend/routes/main_routes.py"
74
+ click UploadRoutes "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/backend/routes/upload_routes.py"
75
+ click ChatRoutes "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/backend/routes/chat_routes.py"
76
+ click StoreRoutes "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/backend/routes/store_routes.py"
77
+ click Config "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/rag_elements/config.py"
78
+ click CoreEngine "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/rag_elements/enhanced_vectordb.py"
79
+ click Tests1 "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/test/test_api_endpoints.py"
80
+ click Tests2 "https://github.com/jatin-mehra119/wasserstoff-aiinterntask/blob/main/test/test_endpoints_pytest.py"
81
+
82
+ %% Styles
83
+ classDef ui fill:#E3F2FD,stroke:#1976D2,color:#0D47A1;
84
+ classDef api fill:#E8F5E9,stroke:#388E3C,color:#1B5E20;
85
+ classDef core fill:#FFF3E0,stroke:#FB8C00,color:#E65100;
86
+ classDef store fill:#FFF9C4,stroke:#FBC02D,color:#F57F17;
87
+ classDef external fill:#ECEFF1,stroke:#607D8B,color:#37474F;
88
+ classDef tests fill:#F3E5F5,stroke:#8E24AA,color:#4A148C;
89
+ ```
90
+
91
+ ## πŸ“‹ Quick Start
92
+
93
+ ### Prerequisites
94
+ - Python 3.8+
95
+ - GROQ API key (for OCR and chat)
96
+
97
+ ### Installation & Running
98
+ ```bash
99
+ # Clone repository
100
+ git clone https://github.com/Jatin-Mehra119/wasserstoff-AiInternTask.git
101
+ cd wasserstoff-AiInternTask
102
+
103
+ # Install dependencies
104
+ pip install -r requirements.txt
105
+
106
+ # Run FastAPI backend (Production)
107
+ cd backend
108
+ python -m uvicorn main:app --host 0.0.0.0 --port 8000 --reload
109
+
110
+ # Open http://localhost:8000 in browser
111
+
112
+ # Alternative: Run Streamlit MVP
113
+ streamlit run streamlit_rag_app.py
114
+ ```
115
+
116
+ ## πŸ”§ Architecture Components
117
+
118
+ ### Core RAG Engine (`rag_elements/`)
119
+ - **`enhanced_vectordb.py`** - Main RAG implementation with document processing, vector search, and AI integration
120
+ - **`config.py`** - Configuration management and settings
121
+
122
+ ### FastAPI Backend (`backend/`)
123
+ - **`main.py`** - Application entry point and server configuration
124
+ - **`models.py`** - Pydantic data models and API schemas
125
+ - **`utils.py`** - Utilities, state management, and helpers
126
+ - **`routes/`** - Modular API endpoints:
127
+ - `main_routes.py` - Frontend serving and health
128
+ - `upload_routes.py` - Document upload and processing
129
+ - `chat_routes.py` - Chat interface and AI responses
130
+ - `store_routes.py` - Vector store persistence
131
+
132
+ ### Frontend (`frontend/`)
133
+ - **`index.html`** - Main application UI
134
+ - **`style.css`** - Responsive design and styling
135
+ - **`script.js`** - Frontend logic and API integration
136
+
137
+ ### Legacy MVP
138
+ - **`streamlit_rag_app.py`** - Original Streamlit implementation
139
+
140
+ ## πŸ“Š Data Flow
141
+
142
+ 1. **Document Upload** β†’ Text extraction β†’ Chunking β†’ Vector embeddings β†’ FAISS index
143
+ 2. **Chat Query** β†’ Semantic search β†’ Context retrieval β†’ AI response generation β†’ Citations
144
+ 3. **Persistence** β†’ Save/load vector stores with metadata
145
+
146
+ ## πŸ”Œ Key APIs
147
+
148
+ - `POST /upload-files` - Process documents
149
+ - `POST /chat` - Chat with documents
150
+ - `GET /stats` - Processing statistics
151
+ - `POST /save-vector-store` - Persist data
152
+ - `POST /load-vector-store` - Restore data
153
+
154
+ ## πŸ§ͺ Testing
155
+
156
+ ```bash
157
+ cd tests
158
+ bash run_tests.sh
159
+ ```
160
+
161
+ ## πŸ“š External Dependencies
162
+
163
+ - **FAISS** - Vector similarity search
164
+ - **GROQ** - Vision OCR and conversational AI
165
+ - **LangChain** - Document processing
166
+ - **FastAPI** - Web framework
167
+ - **Sentence Transformers** - Text embeddings
168
+
169
+ For detailed information, see the main [README.md](../README.md).
docs/index.md ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Documentation Index
2
+
3
+ Welcome to the RAG Chat Application documentation! This directory contains comprehensive guides to help you understand, use, and contribute to the project.
4
+
5
+ ## πŸ“š Documentation Structure
6
+
7
+ ### Quick Start & Overview
8
+ - **[README.md](README.md)** - Project overview, architecture diagram, and quick start guide
9
+ - **[Main README](../README.md)** - Comprehensive project documentation with detailed features and usage
10
+
11
+ ### API Reference
12
+ - **[API.md](API.md)** - Complete REST API documentation with examples and curl commands
13
+
14
+ ### Development
15
+ - **[DEVELOPMENT.md](DEVELOPMENT.md)** - Developer guide for contributing to the project
16
+
17
+ ## 🎯 Getting Started
18
+
19
+ ### For Users
20
+ 1. Read the [Quick Start](README.md#-quick-start) section
21
+ 2. Follow the [Installation & Running](README.md#installation--running) instructions
22
+ 3. Review the [API Reference](API.md) for integration details
23
+
24
+ ### For Developers
25
+ 1. Start with the [Development Setup](DEVELOPMENT.md#-development-setup)
26
+ 2. Understand the [Project Structure](DEVELOPMENT.md#-project-structure)
27
+ 3. Review [Core Components](DEVELOPMENT.md#-core-components)
28
+ 4. Check the [Contributing Guidelines](DEVELOPMENT.md#-contributing)
29
+
30
+ ## πŸ—οΈ Architecture Quick Reference
31
+
32
+ The application follows a layered architecture:
33
+
34
+ - **Client Layer**: Web frontend + Streamlit MVP
35
+ - **Backend Layer**: FastAPI with modular routes
36
+ - **RAG Engine Layer**: Core document processing and vector search
37
+ - **Persistence Layer**: FAISS vector store with metadata
38
+ - **External Services**: GROQ API and Sentence Transformers
39
+
40
+ See the [architecture diagram](README.md#️-architecture-overview) for visual representation.
41
+
42
+ ## πŸ”— Quick Links
43
+
44
+ | Topic | Document | Description |
45
+ |-------|----------|-------------|
46
+ | **Overview** | [README.md](README.md) | Architecture and quick start |
47
+ | **API Endpoints** | [API.md](API.md) | REST API reference |
48
+ | **Development** | [DEVELOPMENT.md](DEVELOPMENT.md) | Contributing guidelines |
49
+ | **Main README** | [../README.md](../README.md) | Detailed project documentation |
50
+ | **Tests** | [../tests/README.md](../tests/README.md) | Testing documentation |
51
+
52
+ ## πŸš€ Core Features
53
+
54
+ - **Multi-format Document Processing**: PDF, text, images, code files
55
+ - **Intelligent Chat Interface**: AI-powered responses with citations
56
+ - **Vector Search**: FAISS-powered semantic similarity search
57
+ - **Persistence**: Save and load processed document collections
58
+ - **Modern Web UI**: Responsive design with real-time updates
59
+ - **Comprehensive API**: RESTful endpoints with interactive documentation
60
+
61
+ ## πŸ› οΈ Tech Stack
62
+
63
+ - **Backend**: FastAPI, Python 3.8+
64
+ - **Frontend**: HTML5, CSS3, JavaScript (ES6+)
65
+ - **AI/ML**: GROQ API, Sentence Transformers, LangChain
66
+ - **Search**: FAISS vector database
67
+ - **Testing**: pytest, requests
68
+
69
+ ## πŸ“ž Support
70
+
71
+ - **Issues**: [GitHub Issues](https://github.com/Jatin-Mehra119/wasserstoff-AiInternTask/issues)
72
+ - **Discussions**: [GitHub Discussions](https://github.com/Jatin-Mehra119/wasserstoff-AiInternTask/discussions)
73
+ - **API Docs**: http://localhost:8000/docs (when server is running)
74
+
75
+ ## πŸ“ Contributing
76
+
77
+ We welcome contributions! Please read the [Development Guide](DEVELOPMENT.md#-contributing) for guidelines on:
78
+
79
+ - Code style and standards
80
+ - Testing requirements
81
+ - Pull request process
82
+ - Adding new features
83
+
84
+ ---
85
+
86
+ *This documentation is maintained alongside the codebase. For the most up-to-date information, always refer to the latest version in the repository.*