Sohan Kshirsagar commited on
Commit
9fabeb7
·
1 Parent(s): 4f0dfc7

Backend Documentation Addition

Browse files
multi_llm_chatbot_backend/README.md ADDED
@@ -0,0 +1,178 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Multi-LLM Chatbot Backend
2
+
3
+ A modular, extensible FastAPI backend for building an AI-powered research advisor chatbot that supports:
4
+ - Multiple AI personas with configurable tone and behavior
5
+ - Dynamic switching between Gemini (cloud) and Ollama (local) LLMs
6
+ - Chat session persistence and context memory
7
+ - Document upload, chunking, and retrieval using RAG
8
+ - Rich export features (PDF, DOCX, TXT)
9
+ - User authentication and JWT-based access control
10
+
11
+ ---
12
+
13
+ ## Backend Architecture
14
+
15
+ ```text
16
+ User Input
17
+
18
+ /chat-sequential → Orchestrator
19
+ ↓ ↙ ↘
20
+ SessionManager ContextManager RAGManager
21
+ ↓ ↓ ↓
22
+ MongoDB Token Trimming ChromaDB
23
+ ↓ ↓ ↓
24
+ Persisted Chat & Doc Context → LLM (Gemini/Ollama)
25
+ ```
26
+
27
+ ---
28
+
29
+ ## Features
30
+
31
+ - Persona-based multi-agent conversation (`Theorist`, `Pragmatist`, etc.)
32
+ - Provider switching (Gemini ↔ Ollama)
33
+ - Context-aware response routing + top-K advisor selection
34
+ - PDF, DOCX, and TXT file upload and semantic retrieval
35
+ - Developer tools: debug personas, test RAG, export sessions
36
+ - Secure authentication and session scoping
37
+
38
+ ---
39
+
40
+ ## Setup Instructions
41
+
42
+ ### 1. Clone and Configure Environment
43
+
44
+ ```bash
45
+ git clone https://github.com/yourorg/multi-llm-chatbot-backend
46
+ cd multi-llm-chatbot-backend
47
+ cp .env.example .env # already provided
48
+ ```
49
+
50
+ ### 2. Python Environment Setup
51
+
52
+ ```bash
53
+ python -m venv venv
54
+ source venv/bin/activate # or venv\Scripts\activate on Windows
55
+
56
+ pip install -r requirements.txt
57
+ ```
58
+
59
+ ### 3. Run the Server
60
+
61
+ ```bash
62
+ uvicorn app.main:app --reload
63
+ ```
64
+
65
+ > Server will be available at: `http://localhost:8000`
66
+
67
+ ---
68
+
69
+ ## FastAPI Routing & Modules
70
+
71
+ | Folder | Description |
72
+ |--------|-------------|
73
+ | [`app/api`](./api_README.md) | REST API endpoints for chat, auth, RAG, exports |
74
+ | [`app/core`](./core_README.md) | Main orchestration, context windows, database logic |
75
+ | [`app/llm`](./llm_README.md) | Gemini + Ollama LLM wrappers |
76
+ | [`app/models`](./models_README.md) | Persona and user schemas |
77
+ | [`app/utils`](./utils_README.md) | File parsing, summaries, exports, vector helpers |
78
+
79
+ ---
80
+
81
+ ## Key Files
82
+
83
+ ### `main.py`
84
+
85
+ - Loads env vars, sets up FastAPI instance with CORS and routers
86
+ - Calls `connect_to_mongo()` on startup and `close_mongo_connection()` on shutdown
87
+ - Imports and registers all routers (`auth`, `chat_sessions`, etc.)
88
+
89
+ ### `.env` (Sample Vars)
90
+
91
+ ```ini
92
+ # MongoDB
93
+ MONGODB_CONNECTION_STRING=mongodb://localhost:27017
94
+ MONGODB_DATABASE_NAME=neon_ai_backend
95
+
96
+ # Gemini API Key and model
97
+ GEMINI_API_KEY=... # Replace with real key
98
+ GEMINI_MODEL=gemini-2.0-flash
99
+
100
+ # Default provider
101
+ DEFAULT_PROVIDER=gemini
102
+ ```
103
+
104
+ ### `requirements.txt`
105
+
106
+ Includes:
107
+ - **FastAPI**, **Uvicorn**: API framework and server
108
+ - **httpx**: Async LLM request handler
109
+ - **motor**, **pymongo**: MongoDB async access
110
+ - **chromadb**, **sentence-transformers**: Vector database + embeddings
111
+ - **PyPDF2**, **docx2txt**, **reportlab**: Document parsing and PDF generation
112
+ - **passlib**, **python-jose**: Auth and security
113
+
114
+ ---
115
+
116
+ ## Persona Design & Context Handling
117
+
118
+ - Personas defined in `app/models/default_personas.py`
119
+ - Rich system prompts, styles, and epistemologies
120
+ - Responses routed through `ImprovedChatOrchestrator`
121
+ - Context trimmed and weighted via `ContextManager`
122
+
123
+ ---
124
+
125
+ ## Switching LLM Providers
126
+
127
+ You can hot-swap models via API:
128
+
129
+ ```http
130
+ POST /switch-provider
131
+ { "provider": "gemini" } | { "provider": "ollama" }
132
+ ```
133
+
134
+ > Also supported: `/switch-model`, `/current-model`, `/current-provider`
135
+
136
+ ---
137
+
138
+ ## Document Upload + RAG
139
+
140
+ - Upload PDFs, DOCX, or TXT to sessions
141
+ - Text is extracted → chunked → embedded → stored in ChromaDB
142
+ - Queried during conversation by persona-aware `EnhancedRAGManager`
143
+
144
+ ---
145
+
146
+ ## Export Options
147
+
148
+ | Format | Export Endpoint |
149
+ |--------|------------------|
150
+ | PDF | `/export-chat?format=pdf` |
151
+ | DOCX | `/export-chat?format=docx` |
152
+ | TXT | `/export-chat?format=txt` |
153
+ | Summary | `/chat-summary?format=pdf` |
154
+
155
+ ---
156
+
157
+ ## Developer & Debug Endpoints
158
+
159
+ | Endpoint | Purpose |
160
+ |----------|---------|
161
+ | `/debug/personas` | See registered advisors and prompts |
162
+ | `/debug/ranked-personas` | View top-K advisors for context |
163
+ | `/debug/rag-status` | Run sample search to test document index |
164
+
165
+ ---
166
+
167
+ ## Status & Roadmap
168
+
169
+ - [x] Multi-LLM backend ready (Gemini + Ollama)
170
+ - [x] Document RAG + export system
171
+ - [x] Session-aware persona routing
172
+ - [x] JWT Auth + MongoDB user handling
173
+ - [ ] UI enhancements and persona memory
174
+ - [ ] Persona fine-tuning support (future)
175
+
176
+ ---
177
+
178
+ For questions, contributions, or deployment help — feel free to reach out!
multi_llm_chatbot_backend/app/api/README.md ADDED
@@ -0,0 +1,175 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # `app/api` – REST API Layer for Multi-LLM Chatbot
2
+
3
+ This module defines the complete FastAPI-based HTTP interface for all backend features, including chat, session management, RAG operations, provider switching, and document interaction.
4
+
5
+ Each file in this directory defines route groups (`APIRouter`) to modularize functionality.
6
+
7
+ ---
8
+
9
+ ## API Directory Layout
10
+
11
+ | File | Purpose |
12
+ |------|---------|
13
+ | `auth.py` | Handles user authentication (login, signup, token validation) |
14
+ | `chat.py` | Core routes for LLM-backed chat, reply-to-advisor, and multi-turn flow |
15
+ | `chat_sessions.py` | Stores user conversations and provides access to saved history |
16
+ | `debug.py` | Developer tools: debug personas, RAG tests, ranking advisor responses |
17
+ | `documents.py` | Upload, parse, index, and query documents via RAG |
18
+ | `provider.py` | Switch between Gemini and Ollama providers |
19
+ | `root.py` | Root `/` endpoint for heartbeat and versioning |
20
+ | `sessions.py` | Tracks and resets session-specific in-memory context |
21
+ | `utils.py` | Helpers used by multiple routers (e.g. session ID management) |
22
+
23
+ ---
24
+
25
+ ## `auth.py` – User Authentication API
26
+
27
+ | Endpoint | Method | Description |
28
+ |----------|--------|-------------|
29
+ | `/signup` | `POST` | Register a new user |
30
+ | `/login` | `POST` | Authenticate user and return access token |
31
+ | `/me` | `GET` | Return current logged-in user |
32
+ | `/healthcheck` | `GET` | Ping endpoint to check login status |
33
+
34
+ Uses JWT-based Bearer token auth via FastAPI dependencies.
35
+
36
+ ---
37
+
38
+ ## `chat.py` – Chat Interaction
39
+
40
+ | Endpoint | Method | Description |
41
+ |----------|--------|-------------|
42
+ | `/chat-sequential` | `POST` | Run a full advisor loop and return all persona responses |
43
+ | `/reply-to-advisor` | `POST` | Ask a question to a specific advisor/persona |
44
+
45
+ These routes handle:
46
+ - Message routing via `ImprovedChatOrchestrator`
47
+ - Persona-wise response generation
48
+ - Embedding document-aware context
49
+ - Returning consistent message structure
50
+
51
+ ---
52
+
53
+ ## `chat_sessions.py` – Persistent Storage of Conversations
54
+
55
+ | Endpoint | Method | Description |
56
+ |----------|--------|-------------|
57
+ | `/chat-sessions` | `GET` | List all saved chat sessions |
58
+ | `/chat-sessions/{id}` | `GET` | Retrieve specific chat session |
59
+ | `/chat-sessions/{id}` | `DELETE` | Soft-delete a chat session |
60
+ | `/chat-sessions/save` | `POST` | Save in-memory session to MongoDB |
61
+
62
+ Saves message history, metadata, and uploaded files.
63
+
64
+ ---
65
+
66
+ ## `debug.py` – Developer Tools
67
+
68
+ | Endpoint | Method | Description |
69
+ |----------|--------|-------------|
70
+ | `/debug/personas` | `GET` | List current personas, prompts, keywords |
71
+ | `/debug/ranked-personas` | `GET` | Return top advisors for current session |
72
+ | `/debug/rag-status` | `GET` | Run sample RAG query + return health info |
73
+
74
+ Provides insight into:
75
+ - Persona prompt preview
76
+ - RAG test queries and indexed documents
77
+ - Session size + truncation status
78
+
79
+ ---
80
+
81
+ ## `documents.py` – Document Upload and RAG
82
+
83
+ | Endpoint | Method | Description |
84
+ |----------|--------|-------------|
85
+ | `/upload-document` | `POST` | Upload and parse a document for semantic search |
86
+ | `/search-documents` | `POST` | RAG search using text query and persona context |
87
+ | `/document-stats` | `GET` | Overview of documents uploaded to session |
88
+ | `/uploaded-files` | `GET` | Return list of uploaded file names |
89
+ | `/document-insights/{filename}` | `GET` | Get detailed metadata for a document |
90
+ | `/export-chat` | `GET` | Export current or stored chat session (PDF, TXT, DOCX) |
91
+ | `/chat-summary` | `GET` | Export summary generated by LLM (multi-format) |
92
+
93
+ Supports file parsing (`PDF`, `DOCX`, `TXT`), chunking, embedding, and export.
94
+
95
+ ---
96
+
97
+ ## `provider.py` – LLM Provider Control
98
+
99
+ | Endpoint | Method | Description |
100
+ |----------|--------|-------------|
101
+ | `/current-provider` | `GET` | Return currently active provider and model |
102
+ | `/switch-provider` | `POST` | Dynamically switch between `gemini` and `ollama` |
103
+ | `/current-model` | `GET` | Get currently loaded model name |
104
+ | `/switch-model` | `POST` | Alias for switching based on model name |
105
+
106
+ Changes are propagated by:
107
+ - Creating new LLM client
108
+ - Re-registering all personas
109
+
110
+ ---
111
+
112
+ ## `sessions.py` – In-Memory Session Management
113
+
114
+ | Endpoint | Method | Description |
115
+ |----------|--------|-------------|
116
+ | `/context` | `GET` | Return current session context (messages, documents, stats) |
117
+ | `/reset-session` | `POST` | Reset in-memory session or specific chat context |
118
+ | `/session-stats` | `GET` | Return stats like message count, file size, timestamps |
119
+ | `/active-sessions` | `GET` | Return list of all active in-memory sessions |
120
+ | `/cleanup-sessions` | `POST` | Manually trigger expired session cleanup |
121
+
122
+ Supports ephemeral sessions and reusable chat contexts (e.g. for documents).
123
+
124
+ ---
125
+
126
+ ## `utils.py` – Route-Level Utilities
127
+
128
+ Defines shared helper:
129
+
130
+ - `get_or_create_session_for_request(request)`
131
+ - `get_or_create_session_for_request_async(request)`
132
+
133
+ These parse session cookies or generate new session IDs, crucial for maintaining separation across:
134
+ - In-memory ephemeral sessions
135
+ - Document-linked long-term sessions
136
+
137
+ ---
138
+
139
+ ## `root.py` – API Healthcheck
140
+
141
+ | Endpoint | Method | Description |
142
+ |----------|--------|-------------|
143
+ | `/` | `GET` | Return version + feature list |
144
+
145
+ Simple heartbeat endpoint used for readiness probes and sanity checks.
146
+
147
+ ---
148
+
149
+ ## Auth Flow Integration
150
+
151
+ Most routes use:
152
+
153
+ ```python
154
+ Depends(get_current_active_user)
155
+ ```
156
+
157
+ This ensures only logged-in users can:
158
+ - Upload and retrieve files
159
+ - Export summaries
160
+ - Save or delete chat sessions
161
+
162
+ JWT tokens are passed via the `Authorization: Bearer ...` header.
163
+
164
+ ---
165
+
166
+ ## High-Level Flow
167
+
168
+ ```text
169
+ Frontend → /chat-sequential → orchestrator → personas → RAG + LLM → response[]
170
+ ↘ /upload-document → extractor → RAG chunks → indexed
171
+ ↘ /context or /reset-session → session_manager
172
+ ↘ /export-chat or /chat-summary → utils + formatter
173
+ ```
174
+
175
+ ---
multi_llm_chatbot_backend/app/core/README.md ADDED
@@ -0,0 +1,200 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # `app/core` – Application Core Logic
2
+
3
+ This is the **central brain** of the multi-LLM chatbot system. It orchestrates user interaction, persona logic, context management, document-based retrieval (RAG), session tracking, authentication, and initialization.
4
+
5
+ ---
6
+
7
+ ## Overview of Modules
8
+
9
+ | Module | Responsibility |
10
+ |--------|----------------|
11
+ | `auth.py` | Authentication (JWT, password hashing, user resolution) |
12
+ | `bootstrap.py` | System startup logic: loads LLMs, personas, orchestrators |
13
+ | `context.py` | Global per-session context (simplified storage) |
14
+ | `context_manager.py` | Core context formatting & windowing for Gemini/Ollama |
15
+ | `database.py` | MongoDB connection & index management |
16
+ | `improved_orchestrator.py` | Main message routing engine: document-aware, multi-persona orchestrator |
17
+ | `rag_manager.py` | RAG with ChromaDB: chunking, storage, semantic search |
18
+ | `session_manager.py` | Full chat lifecycle tracker (in-memory) with RAG hooks |
19
+
20
+ ---
21
+
22
+ ## `auth.py` – Authentication System
23
+
24
+ Handles secure authentication via:
25
+ - Bcrypt hashing (`passlib`)
26
+ - JWT creation and validation (`python-jose`)
27
+ - Secure route access using FastAPI’s `Depends`
28
+
29
+ ### Functions
30
+
31
+ - `get_password_hash(password)` – Hash password using bcrypt
32
+ - `verify_password(plain, hashed)` – Verify password
33
+ - `create_access_token(data)` – Return JWT (30-day expiry default)
34
+ - `get_current_user()` – Decodes token and returns `User` model
35
+ - `authenticate_user(email, password)` – Checks login credentials
36
+ - `create_user_response(user)` – Returns `UserResponse` for frontend
37
+
38
+ ---
39
+
40
+ ## `bootstrap.py` – System Bootstrap
41
+
42
+ Runs once on app startup to:
43
+ - Determine the default LLM provider (Gemini or Ollama)
44
+ - Initialize `ImprovedChatOrchestrator`
45
+ - Inject personas using `get_default_personas(llm)`
46
+
47
+ ```python
48
+ llm = create_llm_client() # Gemini or Ollama
49
+ chat_orchestrator = ImprovedChatOrchestrator()
50
+ DEFAULT_PERSONAS = get_default_personas(llm)
51
+ ```
52
+
53
+ Each persona is **registered** into the orchestrator using `.register_persona()`.
54
+
55
+ ---
56
+
57
+ ## `context.py` – Global Per-Session Context
58
+
59
+ A basic context storage class (`GlobalSessionContext`) that keeps:
60
+ - `full_log`: List of all messages
61
+ - `uploaded_files`: Tracked files per session
62
+ - `total_upload_size`: Helps enforce limits
63
+
64
+ Used primarily in earlier versions or smaller contexts.
65
+
66
+ ---
67
+
68
+ ## `context_manager.py` – LLM Context Window Formatter
69
+
70
+ This class builds optimized context windows for both Gemini and Ollama:
71
+
72
+ ### `ContextManager.prepare_context_for_llm()`
73
+ Returns a `ContextWindow(messages, token_count, truncated)` with:
74
+ - LLM-specific formatting
75
+ - Automatic message pruning based on token limits
76
+ - Recency- and relevance-weighted scoring for old messages
77
+ - Automatic stop tokens, system prompts, and formatting
78
+
79
+ ### Key Features
80
+
81
+ | Feature | Gemini | Ollama |
82
+ |--------|--------|--------|
83
+ | Format | JSON roles + parts | Flat prompt string |
84
+ | Role Mapping | 'user', 'model' | 'User:', 'Assistant:' |
85
+ | Chunking Strategy | Full doc as `Context Document:` | Plain text injection |
86
+ | Stop Sequences | Customizable | Enforced via `stop[]` |
87
+
88
+ Used **by all LLM clients** (Ollama/Gemini) and the **orchestrator**.
89
+
90
+ ---
91
+
92
+ ## `database.py` – MongoDB Connector
93
+
94
+ - Uses `motor` for async MongoDB
95
+ - Exposes `get_database()` to other modules
96
+ - Automatically creates indexes on `users` and `chat_sessions`
97
+ - Controlled via `.env` (`MONGODB_CONNECTION_STRING`)
98
+
99
+ ```python
100
+ await connect_to_mongo()
101
+ await close_mongo_connection()
102
+ ```
103
+
104
+ ---
105
+
106
+ ## `improved_orchestrator.py` – Brain of the Chatbot
107
+
108
+ This is the main **message routing engine**.
109
+
110
+ ### Main Responsibilities
111
+
112
+ - Route user input through:
113
+ - Clarification detection
114
+ - Document-aware context building
115
+ - Persona-level response generation
116
+ - Aggregate responses from **multiple advisors**
117
+ - Embed document-based context (RAG)
118
+
119
+ ### Key Functions
120
+
121
+ - `process_message()` – Entry point for chat flow (calls all advisors)
122
+ - `chat_with_persona()` – Talk to one specific advisor
123
+ - `_generate_persona_responses()` – Routes through each registered persona
124
+ - `_build_enhanced_context_for_persona()` – Combines conversation + document info
125
+
126
+ ### Extras
127
+ - Document parsing hints (`"my thesis"`, `"section 2"`, etc.)
128
+ - Top-K persona ranking (`get_top_personas()`)
129
+ - Persona-specific fallback logic
130
+ - Session reset/deletion
131
+
132
+ Used by `/chat-sequential`, `/reply-to-advisor`, etc.
133
+
134
+ ---
135
+
136
+ ## `rag_manager.py` – RAG System for Docs
137
+
138
+ Supports **vector-based retrieval** using:
139
+
140
+ - Sentence Transformers (`all-MiniLM-L6-v2`)
141
+ - ChromaDB (`PersistentClient` with metadata)
142
+ - Metadata-aware enhanced chunking
143
+ - Overlapping token window strategy
144
+ - Section-wise classification
145
+
146
+ ### Core Components
147
+
148
+ | Class | Role |
149
+ |-------|------|
150
+ | `RAGManager` | Standard chunking, basic RAG |
151
+ | `EnhancedRAGManager` | Persona-aware + metadata-annotated vector chunks |
152
+
153
+ ### `EnhancedRAGManager` supports:
154
+ - Section tagging (`methodology`, `theory`, etc.)
155
+ - Multi-level filters (`session_id`, `filename`)
156
+ - Attribution fields (`chunk_position`, `has_methodology`)
157
+ - Relevance scoring and ranking
158
+
159
+ Used by orchestrator when generating document-aware responses.
160
+
161
+ ---
162
+
163
+ ## `session_manager.py` – Chat Lifecycle Controller
164
+
165
+ Handles:
166
+ - In-memory session creation + cleanup (with expiration)
167
+ - Tracks uploaded files and size
168
+ - Holds message logs for each session
169
+ - Links to RAG via `add_uploaded_file()` and `get_rag_stats()`
170
+
171
+ ### `ConversationContext`
172
+
173
+ | Attribute | Description |
174
+ |-----------|-------------|
175
+ | `messages` | List of role-message entries |
176
+ | `uploaded_files` | Filenames (content stored in RAG DB) |
177
+ | `document_chunks_count` | Count of indexed doc chunks |
178
+ | `last_retrieval_stats` | From last RAG search |
179
+ | `created_at`, `last_accessed` | Session activity tracking |
180
+
181
+ Includes:
182
+ - Reset functions (`clear_all_data()`)
183
+ - File-level message logging (`append_message()`)
184
+
185
+ ### `SessionManager`
186
+
187
+ - Thread-safe via locks
188
+ - Handles cleanup of expired sessions (`_cleanup_expired_sessions()`)
189
+ - Returns statistics via `get_session_stats()`
190
+
191
+ ---
192
+
193
+ ## Interactions Summary
194
+
195
+ ```text
196
+ User Input → Orchestrator
197
+ ↳ SessionManager → Context
198
+ ↳ RAGManager → Relevant Docs
199
+ ↳ LLMClient (Gemini/Ollama) ← ContextManager
200
+ ```
multi_llm_chatbot_backend/app/llm/README.md ADDED
@@ -0,0 +1,173 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # `app/llm` – LLM Integration Layer
2
+
3
+ This module abstracts and implements communication with **local** and **cloud-based** large language models (LLMs) via interchangeable client wrappers.
4
+
5
+ It defines:
6
+ - A common interface for all LLM clients (`LLMClient`)
7
+ - A wrapper for Google Gemini API (`ImprovedGeminiClient`)
8
+ - A wrapper for Ollama local models (`ImprovedOllamaClient`)
9
+ - A sentence transformer embedding model (`embedding_client.py`)
10
+
11
+ ---
12
+
13
+ ## Abstract Base – `llm_client.py`
14
+
15
+ This file defines the **contract** that all LLM clients must follow.
16
+
17
+ ### `class LLMClient (ABC)`
18
+
19
+ An abstract base class using Python’s `abc` module.
20
+
21
+ ```python
22
+ @abstractmethod
23
+ async def generate(system_prompt: str, context: List[dict], temperature: float, max_tokens: int) -> str
24
+ ```
25
+
26
+ Every model wrapper must implement this coroutine to generate a response given:
27
+ - A system prompt (persona instructions)
28
+ - A user/system message context (list of `{role, content}` dicts)
29
+ - A temperature (float 0.0–1.0, typically scaled from 0–10)
30
+ - A token limit (integer)
31
+
32
+ ---
33
+
34
+ ## Gemini Client – `improved_gemini_client.py`
35
+
36
+ ### Overview
37
+
38
+ - Communicates with **Google’s Gemini API** via `httpx`
39
+ - Dynamically injects the `system_prompt` into the context using `context_manager`
40
+ - Uses environment variables for API key and model name (`GEMINI_API_KEY`, `GEMINI_MODEL`)
41
+
42
+ ### Key Features
43
+
44
+ | Feature | Description |
45
+ |--------|-------------|
46
+ | Context Prep | Uses `context_manager.prepare_context_for_llm()` to optimize message length |
47
+ | Endpoint | `https://generativelanguage.googleapis.com/v1beta/models/{model_name}:generateContent` |
48
+ | Content Format | Gemini expects JSON-formatted `contents`, not string prompts |
49
+ | Safety Settings | Blocks harmful or explicit content categories |
50
+ | Fallback Logic | Returns user-friendly error messages on bad or empty responses |
51
+ | Token Limit | `maxOutputTokens` passed explicitly |
52
+
53
+ ### SafetyConfig JSON Example
54
+
55
+ ```json
56
+ "safetySettings": [
57
+ {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_MEDIUM_AND_ABOVE"},
58
+ {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_MEDIUM_AND_ABOVE"}
59
+ ]
60
+ ```
61
+
62
+ ### Differences from Ollama
63
+ - Requires an API key and runs over HTTPS
64
+ - Parses deeply nested JSON structures (candidates → content → parts)
65
+ - Strict token and safety controls
66
+ - More structured response format
67
+
68
+ ---
69
+
70
+ ## Ollama Client – `improved_ollama_client.py`
71
+
72
+ ### Overview
73
+
74
+ - Interfaces with a **local Ollama model server** (`http://localhost:11434`)
75
+ - Sends prompts as raw formatted strings (not JSON "messages")
76
+ - Uses `context_manager` to prepare prompt text
77
+
78
+ ### Key Features
79
+
80
+ | Feature | Description |
81
+ |--------|-------------|
82
+ | Endpoint | `/api/generate` |
83
+ | Payload | Flat prompt string + generation config |
84
+ | Cleansing | Strips verbose, inconsistent prefixes or filler |
85
+ | Quality Filter | Removes overly verbose or vague responses |
86
+ | Robust | Recovers from connection and timeout failures |
87
+
88
+ ### Prompt Payload Example
89
+
90
+ ```json
91
+ {
92
+ "model": "llama3.2:1b",
93
+ "prompt": "System: You are a helpful advisor...\nUser: What is...",
94
+ "stream": false,
95
+ "options": {
96
+ "temperature": 0.4,
97
+ "top_p": 0.9,
98
+ "top_k": 40,
99
+ "num_predict": 300,
100
+ "repeat_penalty": 1.1,
101
+ "stop": ["Student:", "User:", "Question:"]
102
+ }
103
+ }
104
+ ```
105
+
106
+ ### Differences from Gemini
107
+
108
+ | Area | Gemini | Ollama |
109
+ |------|--------|--------|
110
+ | Hosting | Cloud API | Local server |
111
+ | Format | JSON "messages" | Raw string prompt |
112
+ | Safety Filters | Yes | No |
113
+ | Token Control | `maxOutputTokens` | `num_predict` |
114
+ | Output | Structured parts | Single `response` string |
115
+ | Response Cleaning | Minimal | Aggressively stripped of fluff |
116
+ | Performance | High-quality, slower | Fast & offline |
117
+
118
+ ---
119
+
120
+ ## Embedding Model – `embedding_client.py`
121
+
122
+ ### Purpose
123
+
124
+ Provides embedding vectors (used for semantic similarity and document retrieval) using `sentence-transformers`.
125
+
126
+ ### Uses:
127
+ - Model: `all-MiniLM-L6-v2` (lightweight + performant)
128
+ - Library: `sentence-transformers`
129
+ - Function: `get_embedding(text: str) -> List[float]`
130
+
131
+ ```python
132
+ embedding = get_embedding("example sentence")
133
+ ```
134
+
135
+ ### Notes
136
+ - This module does **not** use Gemini embeddings (for cost and simplicity)
137
+ - Can be upgraded later to use Gemini’s `embedding` endpoint or Ollama-based models with vector support
138
+
139
+ ---
140
+
141
+ ## Environment Variables
142
+
143
+ | Variable | Description | Example |
144
+ |----------|-------------|---------|
145
+ | `GEMINI_API_KEY` | API key for Google Gemini | `AIzz123...` |
146
+ | `GEMINI_MODEL` | Default Gemini model name | `gemini-2.0-flash` |
147
+ | `OLLAMA_BASE_URL` | Local server base URL | `http://localhost:11434` |
148
+
149
+ ---
150
+
151
+ ## Context Management Integration
152
+
153
+ Both clients use:
154
+
155
+ ```python
156
+ context_window = context_manager.prepare_context_for_llm(...)
157
+ ```
158
+
159
+ This ensures that:
160
+ - Prompt fits within model limits
161
+ - Truncation metadata is logged/debugged
162
+ - Messages are pre-formatted or optimized per provider
163
+
164
+ ---
165
+
166
+ ## Error Handling
167
+
168
+ All clients log internal issues and fallback to graceful responses. Each client handles:
169
+ - Timeouts (`httpx.TimeoutException`)
170
+ - API errors (`httpx.HTTPStatusError`, bad payloads)
171
+ - Unexpected failures (fallback strings are returned)
172
+
173
+ ---
multi_llm_chatbot_backend/app/models/README.md ADDED
@@ -0,0 +1,140 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # `app/models` – Data Models & Persona Configuration
2
+
3
+ This module defines the **core data structures** for users, chat sessions, and AI advisor personas in the Multi-LLM Chatbot Backend.
4
+
5
+ It plays a foundational role in ensuring that:
6
+ - User data and session state are **structured, validated, and serializable**
7
+ - Persona behavior is **configurable, injectable, and extensible**
8
+
9
+ ---
10
+
11
+ ## Persona Model (`persona.py`)
12
+
13
+ ### `class Persona`
14
+
15
+ Represents a single AI advisor with its own personality, tone, and domain of expertise.
16
+
17
+ | Attribute | Description |
18
+ |----------------|-------------|
19
+ | `id` | Unique identifier for the persona |
20
+ | `name` | Human-readable display name |
21
+ | `system_prompt`| The persona’s default LLM instruction |
22
+ | `llm` | Instance of the LLM client (Gemini/Ollama) |
23
+ | `temperature` | Controls creativity level (0–10 scale, converted to 0.0–1.0 internally) |
24
+
25
+ ### `respond()` method
26
+
27
+ This asynchronous method generates a persona-specific reply using the provided context and desired `response_length` (short, medium, long). It uses a **system prompt + user messages** + length-based instructions.
28
+
29
+ ```python
30
+ await persona.respond(context=messages, response_length="medium")
31
+ ```
32
+
33
+ ---
34
+
35
+ ## Persona Registry (`default_personas.py`)
36
+
37
+ Defines and registers **all built-in personas** using detailed `system_prompt` templates and metadata.
38
+
39
+ > These prompts define the tone, response style, formatting rules, document behavior, and epistemological approach of each advisor.
40
+
41
+ ### Available Personas
42
+
43
+ - `methodologist`: Research methods and design expert
44
+ - `theorist`: Theoretical frameworks and philosophy of science
45
+ - `pragmatist`: Action-oriented coach with a focus on task execution
46
+ - `socratic`: Socratic questioning mentor
47
+ - `motivator`: Psychology-focused coach to build momentum
48
+ - `critic`: Constructive reviewer with sharp academic critique
49
+ - `storyteller`: Communication and storytelling specialist
50
+ - `minimalist`: Minimal guidance, maximum clarity
51
+ - `visionary`: Long-term strategy and innovation
52
+ - `empathetic`: Emotionally aware advisor for mental health & motivation
53
+
54
+ ### Registry Functions
55
+
56
+ | Function | Description |
57
+ |---------|-------------|
58
+ | `get_default_personas(llm)` | Returns a list of `Persona` instances with LLM injected |
59
+ | `get_default_persona_prompt(pid)` | Returns only the `system_prompt` of a persona |
60
+ | `is_valid_persona_id(pid)` | Checks if ID exists in registry |
61
+ | `list_available_personas()` | Lists all persona IDs |
62
+
63
+ ---
64
+
65
+ ## User & Session Models (`user.py`)
66
+
67
+ ### `UserCreate` / `UserLogin`
68
+
69
+ Pydantic models for request payloads during signup/login.
70
+
71
+ ### `User`
72
+
73
+ Persistent user object, mapped to MongoDB using `_id` aliasing.
74
+
75
+ | Field | Description |
76
+ |-------|-------------|
77
+ | `id` (`_id`) | MongoDB ObjectId |
78
+ | `email`, `hashed_password` | Auth fields |
79
+ | `academicStage`, `researchArea` | Optional metadata |
80
+ | `created_at`, `last_login` | Timestamps |
81
+ | `is_active` | Soft-deletion or block flag |
82
+
83
+ ### `UserResponse`
84
+
85
+ Serialized user profile returned to frontend after login/token validation.
86
+
87
+ ---
88
+
89
+ ### `ChatSession`
90
+
91
+ Stores a **single multi-turn conversation**. Used for RAG context, memory, and export.
92
+
93
+ | Field | Description |
94
+ |-------|-------------|
95
+ | `id` | MongoDB `_id` |
96
+ | `user_id` | Owner user’s ID |
97
+ | `title` | Human-readable title |
98
+ | `messages` | List of exchanged messages |
99
+ | `created_at`, `updated_at` | Session lifecycle tracking |
100
+ | `is_active` | Whether it is a deleted/inactive session |
101
+
102
+ ### `ChatSessionResponse`
103
+
104
+ Returned when listing past sessions (lightweight response).
105
+
106
+ ---
107
+
108
+ ### `Token`
109
+
110
+ Used as the unified login response structure:
111
+
112
+ ```json
113
+ {
114
+ "access_token": "...",
115
+ "token_type": "bearer",
116
+ "user": { ... }
117
+ }
118
+ ```
119
+
120
+ ---
121
+
122
+ ## Design Principles
123
+
124
+ - All models are **fully compatible with FastAPI + Pydantic**
125
+ - MongoDB integration uses `bson.ObjectId` support and aliases
126
+ - Persona logic is **decoupled** from orchestration — easy to extend
127
+ - System prompts are rich, structured, and **frontend-format aware** (markdown rules enforced)
128
+
129
+ ---
130
+
131
+ ## Next Steps
132
+
133
+ This module is used by:
134
+
135
+ - `core/improved_orchestrator.py` – Persona routing
136
+ - `routes/chat.py` – Sequential chat + replies
137
+ - `auth.py` – Token generation and validation
138
+ - `documents.py` – Document-enhanced message generation
139
+
140
+ > Add a new persona? Just extend `DEFAULT_PERSONAS` and restart the backend.
multi_llm_chatbot_backend/app/utils/README.md ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # `app/utils` – Utility Modules for Summarization, Export, and Embeddings
2
+
3
+ This directory includes reusable tools that support the backend application with:
4
+
5
+ - Chat summarization for display/export
6
+ - Document extraction and cleanup
7
+ - File export to TXT, DOCX, and PDF formats
8
+ - File upload validation
9
+ - Persona-specific vector DB with ChromaDB
10
+
11
+ These modules are loosely coupled and used across core routes, RAG logic, and export endpoints.
12
+
13
+ ---
14
+
15
+ ## `chat_summary.py` – Conversation Summarization
16
+
17
+ This module provides summarization of past conversations using the LLM client.
18
+
19
+ ### Key Functions
20
+
21
+ - `generate_summary_from_messages(messages, llm, max_tokens)` – Generates a formatted, bullet-style summary
22
+ - `format_summary_for_text_export(summary_text)` – Cleans summary for export to PDF/DOCX/TXT
23
+ - `parse_summary_to_blocks(summary_text)` – Converts summary to structured blocks (headings, lists, paragraphs)
24
+
25
+ ### Format Guidelines
26
+
27
+ Summaries follow a markdown-style format with:
28
+ - `**Section Name:**` for headings
29
+ - `* Bullet Points` for insights and recommendations
30
+ - Auto-trimming and line breaks for export formatting
31
+
32
+ ---
33
+
34
+ ## `chroma_client.py` – Persona-Specific Knowledge Store
35
+
36
+ A minimal ChromaDB wrapper used to store and query persona-specific documents or embeddings.
37
+
38
+ ### Functions
39
+
40
+ - `add_persona_doc(text, persona, doc_id)` – Add a new chunk/document for a persona
41
+ - `query_persona_knowledge(query, persona)` – Query ChromaDB for a persona-specific response
42
+
43
+ ### Notes
44
+
45
+ - Uses `./chroma_storage` as the default persistent path
46
+ - Uses the local embedding model via `get_embedding()` from `embedding_client.py`
47
+
48
+ ---
49
+
50
+ ## `document_extractor.py` – File Text Extraction
51
+
52
+ Supports extracting raw text from uploaded documents.
53
+
54
+ ### Supported Formats
55
+
56
+ | Format | Content Type |
57
+ |--------|---------------|
58
+ | PDF | `application/pdf` |
59
+ | DOCX | `application/vnd.openxmlformats-officedocument.wordprocessingml.document` |
60
+ | TXT | `text/plain` |
61
+
62
+ ### Key Function
63
+
64
+ ```python
65
+ extract_text_from_file(file_bytes: bytes, content_type: str) -> str
66
+ ```
67
+
68
+ Uses:
69
+ - `PyPDF2` for PDFs
70
+ - `docx2txt` for Word documents (via temp file)
71
+ - UTF-8 decoding for plain text
72
+
73
+ ---
74
+
75
+ ## `file_export.py` – Export Chat & Summaries
76
+
77
+ Exports content (chat logs or summaries) to the following formats:
78
+ - `.txt`
79
+ - `.docx` (Word)
80
+ - `.pdf` (ReportLab)
81
+
82
+ ### Key Functions
83
+
84
+ - `export_chat_as_file(content, format)` – Unified export method (calls generate_*)
85
+ - `prepare_export_response()` – Returns a `StreamingResponse` with correct content-disposition
86
+
87
+ ### Formatting Functions
88
+
89
+ - `generate_txt_file()` – Simple UTF-8 stream
90
+ - `generate_docx_file()` – Paragraph-based Word file using `python-docx`
91
+ - `generate_pdf_file()` – Uses ReportLab’s Platypus for chat-style layout
92
+ - `generate_pdf_file_from_blocks()` – Used for structured summaries (heading, lists, etc.)
93
+
94
+ All formats apply automatic cleanup and styling via:
95
+ - `_clean_text_for_pdf()` and `_render_rich_text()`
96
+
97
+ ---
98
+
99
+ ## `file_limits.py` – Upload Size Checks
100
+
101
+ Used to prevent users from uploading excessively large files in a session.
102
+
103
+ ### Configurable Limit
104
+
105
+ ```python
106
+ MAX_TOTAL_UPLOAD_MB = 10
107
+ ```
108
+
109
+ ### Function
110
+
111
+ - `is_within_upload_limit(session_id, new_file_bytes, session_context)` – Returns `True` if upload is within session cap
112
+
113
+ Used by routes handling document uploads.
114
+
115
+ ---
116
+
117
+ ## Dependencies
118
+
119
+ These modules are used in:
120
+
121
+ | Module | Depends On |
122
+ |--------|------------|
123
+ | `rag_manager.py` | `document_extractor`, `file_limits` |
124
+ | `chat_summary.py` | `llm_client` |
125
+ | `routes/documents.py` | `document_extractor`, `file_limits` |
126
+ | `routes/export.py` | `file_export`, `chat_summary` |
127
+
128
+ ---
129
+
130
+ ## Example Workflow
131
+
132
+ ```text
133
+ Upload File → document_extractor.py → raw text
134
+
135
+ file_limits.py → check quota
136
+
137
+ Chat History → chat_summary.py → formatted summary
138
+
139
+ file_export.py → TXT, DOCX, PDF
140
+
141
+ Persona Notes → chroma_client.py → embedded in ChromaDB
142
+ ```