jdesiree commited on
Commit
89b3776
Β·
verified Β·
1 Parent(s): d3efb55

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +399 -0
README.md ADDED
@@ -0,0 +1,399 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Mimir
3
+ emoji: πŸ“š
4
+ colorFrom: indigo
5
+ colorTo: blue
6
+ sdk: gradio
7
+ sdk_version: 5.49.1
8
+ app_file: app.py
9
+ pinned: true
10
+ python_version: '3.10'
11
+ short_description: Advanced prompt engineering for educational AI systems.
12
+ thumbnail: >-
13
+ https://cdn-uploads.huggingface.co/production/uploads/68700e7552b74a1dcbb2a87e/Z7P8DJ57rc5P1ozA5gwp3.png
14
+ hardware: zero-gpu-dynamic
15
+ startup_duration_timeout: 3h
16
+ hf_oauth: true
17
+ hf_oauth_expiration_minutes: 180
18
+ preload_from_hub:
19
+ - meta-llama/Llama-3.2-3B-Instruct
20
+ ---
21
+
22
+ # Mimir: Educational AI Assistant
23
+ ## Advanced Multi-Agent Architecture & Prompt Engineering Portfolio Project
24
+
25
+ ### Project Overview
26
+ Mimir demonstrates enterprise-grade AI system design through a sophisticated multi-agent architecture applied to educational technology. The system showcases advanced prompt engineering, intelligent decision-making pipelines, and state-persistent conversation management. Unlike simple single-model implementations, Mimir employs **four specialized agent types** working in concert: a tool decision engine, four parallel routing agents for prompt selection, three preprocessing thinking agents for complex reasoning, and a fine-tuned response generator. This architecture prioritizes pedagogical effectiveness through dynamic context assembly, ensuring responses are tailored to each unique educational interaction.
27
+
28
+ ***
29
+
30
+ ### Technical Architecture
31
+
32
+ **Multi-Agent System:**
33
+ ```
34
+ User Input β†’ Tool Decision Agent β†’ Routing Agents (4x) β†’ Thinking Agents (3x) β†’ Response Agent β†’ Output
35
+
36
+ ```
37
+
38
+ **Core Technologies:**
39
+
40
+ * **Multi-Model Architecture**: Mistral-Small-24B (24B parameters) for decision-making and reasoning, Phi-3-mini (fine-tuned) for educational response generation, GGUF-quantized Mistral for mathematical tree-of-thought reasoning
41
+ * **Custom Orchestration**: Hand-built agent coordination replacing traditional frameworks for precise control and optimization
42
+ * **State Management**: Thread-safe global state with dual persistence (SQLite + HuggingFace Datasets)
43
+ * **ZeroGPU Integration**: Dynamic GPU allocation with `@spaces.GPU` decorators for efficient resource usage
44
+ * **Gradio**: Multi-page interface (Chatbot + Analytics Dashboard)
45
+ * **Python**: Advanced backend with lazy loading, quantization, and streaming
46
+
47
+ **Key Frameworks & Libraries:**
48
+
49
+ * `transformers` & `accelerate` for model loading and inference optimization
50
+ * `bitsandbytes` for 4-bit quantization (75% memory reduction)
51
+ * `peft` for Parameter-Efficient Fine-Tuning support
52
+ * `llama-cpp-python` for GGUF model inference
53
+ * `spaces` for HuggingFace ZeroGPU integration
54
+ * `matplotlib` for dynamic visualization generation
55
+ * Custom state management system with SQLite and dataset backup
56
+
57
+ ***
58
+
59
+ ### Advanced Agent Architecture
60
+
61
+ #### Agent Pipeline Overview
62
+ The system processes each user interaction through a sophisticated four-stage pipeline, with each stage making intelligent decisions that shape the final response.
63
+
64
+ #### Stage 1: Tool Decision Agent
65
+ **Purpose**: Determines if visualization tools enhance learning
66
+
67
+ **Model**: Mistral-Small-24B (4-bit quantized)
68
+
69
+ **Prompt Engineering**:
70
+ * Highly constrained binary decision prompt (YES/NO only)
71
+ * Explicit INCLUDE/EXCLUDE criteria for educational contexts
72
+ * Zero-shot classification with educational domain knowledge
73
+
74
+ **Decision Criteria**:
75
+ ```
76
+ INCLUDE: Mathematical functions, data analysis, chart interpretation,
77
+ trend visualization, proportional relationships
78
+
79
+ EXCLUDE: Greetings, definitions, explanations without data
80
+ ```
81
+
82
+ **Output**: Boolean flag activating `TOOL_USE_ENHANCEMENT` prompt segment
83
+
84
+ ---
85
+
86
+ #### Stage 2: Prompt Routing Agents (4 Specialized Agents)
87
+ **Purpose**: Intelligent prompt segment selection through parallel analysis
88
+
89
+ **Model**: Shared Mistral-Small-24B instance (memory efficient)
90
+
91
+ **Agent Specializations**:
92
+
93
+ 1. **Agent 1 - Practice Question Detector**
94
+ - Analyzes conversation context for practice question opportunities
95
+ - Considers user's expressed understanding and learning progression
96
+ - Activates: `STRUCTURE_PRACTICE_QUESTIONS`
97
+
98
+ 2. **Agent 2 - Discovery Mode Classifier**
99
+ - Dual-classification: vague input detection + understanding assessment
100
+ - Returns: `VAUGE_INPUT`, `USER_UNDERSTANDING`, or neither
101
+ - Enables guided discovery and clarification strategies
102
+
103
+ 3. **Agent 3 - Follow-up Assessment Agent**
104
+ - Detects if user is responding to previous practice questions
105
+ - Analyzes conversation history for grading opportunities
106
+ - Activates: `PRACTICE_QUESTION_FOLLOWUP` (triggers grading mode)
107
+
108
+ 4. **Agent 4 - Teaching Mode Assessor**
109
+ - Evaluates need for direct instruction vs. structured practice
110
+ - Multi-output agent (can activate multiple prompts)
111
+ - Activates: `GUIDING_TEACHING`, `STRUCTURE_PRACTICE_QUESTIONS`
112
+
113
+ **Prompt Engineering Innovation**:
114
+ * Each agent uses a specialized system prompt with clear decision criteria
115
+ * Structured output formats for reliable parsing
116
+ * Context-aware analysis incorporating full conversation history
117
+ * Sequential execution prevents decision conflicts
118
+
119
+ ---
120
+
121
+ #### Stage 3: Thinking Agents (Preprocessing Layer)
122
+ **Purpose**: Generate reasoning context before final response (CoT/ToT)
123
+
124
+ **Models**:
125
+ - Standard Mistral-Small-24B (QA Design, General Reasoning)
126
+ - GGUF Mistral (Mathematical Tree-of-Thought)
127
+
128
+ **Agent Specializations**:
129
+
130
+ 1. **Math Thinking Agent (GGUF)**
131
+ - **Method**: Tree-of-Thought reasoning for mathematical problems
132
+ - **Activation**: When `LATEX_FORMATTING` is active
133
+ - **Output Structure**:
134
+ ```
135
+ Key Terms β†’ Principles β†’ Formulas β†’ Step-by-Step Solution β†’ Summary
136
+ ```
137
+ - **Complexity Routing**: Decision tree determines detail level (1A: basic, 1B: complex)
138
+
139
+ 2. **Question/Answer Design Agent**
140
+ - **Method**: Chain-of-Thought for practice question formulation
141
+ - **Activation**: When `STRUCTURE_PRACTICE_QUESTIONS` is active
142
+ - **Formatted Inputs**: Tool context, LaTeX guidelines, practice question templates
143
+ - **Output**: Question design, data formatting, answer bank generation
144
+
145
+ 3. **Reasoning Thinking Agent**
146
+ - **Method**: General Chain-of-Thought preprocessing
147
+ - **Activation**: When tools, follow-ups, or teaching mode active
148
+ - **Output Structure**:
149
+ ```
150
+ User Knowledge Summary β†’ Understanding Analysis β†’
151
+ Previous Actions β†’ Reference Fact Sheet
152
+ ```
153
+
154
+ **Prompt Engineering Innovation**:
155
+ * Thinking agents produce **context for ResponseAgent**, not final output
156
+ * Outputs are invisible to user but inform response quality
157
+ * Tree-of-Thought (ToT) for math: explores multiple solution paths
158
+ * Chain-of-Thought (CoT) for others: step-by-step reasoning traces
159
+
160
+ ---
161
+
162
+ #### Stage 4: Response Agent (Educational Response Generation)
163
+ **Purpose**: Generate pedagogically sound final response
164
+
165
+ **Model**: Phi-3-mini-4k-instruct (fine-tuned on educational data)
166
+ - **Primary**: `jdesiree/Mimir-Phi-3.5` (fine-tuned)
167
+ - **Fallback**: Microsoft base model (automatic failover)
168
+
169
+ **Configuration**:
170
+ * 4-bit quantization (BitsAndBytes NF4)
171
+ * Mixed precision FP16 inference
172
+ * Accelerate integration for distributed computation
173
+ * PEFT-enabled for adapter support
174
+
175
+ **Prompt Assembly Process**:
176
+ 1. **Core Identity**: Always included (defines Mimir persona)
177
+ 2. **Logical Expressions**: Regex-triggered prompts (e.g., math keywords β†’ `LATEX_FORMATTING`)
178
+ 3. **Agent-Selected Prompts**: Dynamic assembly based on routing agent decisions
179
+ 4. **Context Integration**: Tool outputs, thinking agent outputs, conversation history
180
+ 5. **Complete Prompt**: All segments joined with proper formatting
181
+
182
+ **Dynamic Prompt Library** (11 segments):
183
+ ```
184
+ Core: CORE_IDENTITY (always)
185
+ Formatting: GENERAL_FORMATTING (always), LATEX_FORMATTING (math)
186
+ Discovery: VAUGE_INPUT, USER_UNDERSTANDING
187
+ Teaching: GUIDING_TEACHING
188
+ Practice: STRUCTURE_PRACTICE_QUESTIONS, PRACTICE_QUESTION_FOLLOWUP
189
+ Tool: TOOL_USE_ENHANCEMENT
190
+ ```
191
+
192
+ **Response Post-Processing**:
193
+ * Artifact cleanup (remove `<|end|>`, `###`, etc.)
194
+ * Intelligent truncation at logical breakpoints
195
+ * Sentence integrity preservation
196
+ * Quality validation gates
197
+ * Word-by-word streaming for UX
198
+
199
+ ---
200
+
201
+ ### Prompt Engineering Techniques Demonstrated
202
+
203
+ #### 1. Hierarchical Prompt Architecture
204
+ **Three-Layer System**:
205
+ - **Agent System Prompts**: Specialized instructions for each agent type
206
+ - **Response Prompt Segments**: Modular components dynamically assembled
207
+ - **Thinking Prompts**: Preprocessing templates for reasoning generation
208
+
209
+ **Innovation**: Separates decision-making logic from response generation, enabling precise control over AI behavior at each pipeline stage.
210
+
211
+ #### 2. Per-Turn Prompt State Management
212
+ **PromptStateManager**:
213
+ ```python
214
+ # Reset at turn start - clean slate
215
+ prompt_state.reset() # All 11 prompts β†’ False
216
+
217
+ # Agents activate relevant prompts
218
+ prompt_state.update("LATEX_FORMATTING", True)
219
+ prompt_state.update("GUIDING_TEACHING", True)
220
+
221
+ # Assemble only active prompts
222
+ active_prompts = prompt_state.get_active_response_prompts()
223
+ # Returns: ["CORE_IDENTITY", "GENERAL_FORMATTING",
224
+ # "LATEX_FORMATTING", "GUIDING_TEACHING"]
225
+ ```
226
+
227
+ **Benefits**:
228
+ - No prompt pollution between turns
229
+ - Context-appropriate responses every time
230
+ - Traceable decision-making for debugging
231
+
232
+ #### 3. Logical Expression System
233
+ **Regex-Based Automatic Activation**:
234
+ ```python
235
+ # Math keyword detection
236
+ math_regex = r'\b(calculus|algebra|equation|solve|derivative)\b'
237
+ if re.search(math_regex, user_input, re.IGNORECASE):
238
+ prompt_state.update("LATEX_FORMATTING", True)
239
+ ```
240
+
241
+ **Hybrid Approach**: Combines rule-based triggers with LLM decision-making for optimal reliability.
242
+
243
+ #### 4. Constraint-Based Agent Prompting
244
+ **Tool Decision Example**:
245
+ ```
246
+ System Prompt: Analyze query and determine if visualization needed.
247
+
248
+ Output Format: YES or NO (nothing else)
249
+
250
+ INCLUDE if: mathematical functions, data analysis, trends
251
+ EXCLUDE if: greetings, simple definitions, no data
252
+ ```
253
+
254
+ **Result**: Reliable, parseable outputs from agents without complex post-processing.
255
+
256
+ #### 5. Chain-of-Thought & Tree-of-Thought Preprocessing
257
+ **CoT for Sequential Reasoning**:
258
+ ```
259
+ Step 1: Assess topic β†’
260
+ Step 2: Identify user understanding β†’
261
+ Step 3: Previous actions β†’
262
+ Step 4: Reference facts
263
+ ```
264
+
265
+ **ToT for Mathematical Reasoning**:
266
+ ```
267
+ Question Type Assessment β†’
268
+ Branch 1A (Simple): Minimal steps
269
+ Branch 1B (Complex): Full derivation with principles
270
+ ```
271
+
272
+ **Innovation**: Thinking agents generate rich context that guides ResponseAgent to higher-quality outputs.
273
+
274
+ #### 6. Academic Integrity by Design
275
+ **Embedded in Core Prompts**:
276
+ * "Do not provide full solutions - guide through processes instead"
277
+ * "Break problems into conceptual components"
278
+ * "Ask clarifying questions about their understanding"
279
+ * Subject-specific guidelines (Math: explain concepts, not compute)
280
+
281
+ **Follow-up Grading**:
282
+ * Agent 3 detects practice question responses
283
+ * `PRACTICE_QUESTION_FOLLOWUP` prompt activates
284
+ * Automated assessment with constructive feedback
285
+
286
+ #### 7. Multi-Modal Response Generation
287
+ **Tool Integration**:
288
+ ```python
289
+ # Tool decision β†’ JSON generation β†’ matplotlib rendering β†’ base64 encoding
290
+ Create_Graph_Tool(
291
+ data={"Week 1": 120, "Week 2": 155, ...},
292
+ plot_type="line",
293
+ title="Crop Yield Analysis",
294
+ educational_context="Visualizes growth trend over time"
295
+ )
296
+ ```
297
+
298
+ **Result**: In-memory graph generation with educational context, embedded directly in response.
299
+
300
+ ---
301
+
302
+ ### State Management & Persistence
303
+
304
+ #### GlobalStateManager Architecture
305
+ **Dual-Layer Persistence**:
306
+ 1. **SQLite Database**: Fast local access, immediate writes
307
+ 2. **HuggingFace Dataset**: Cloud backup, hourly sync
308
+
309
+ **State Categories**:
310
+ ```python
311
+ - Conversation State: Full chat history + agent context
312
+ - Prompt State: Per-turn activation (resets each interaction)
313
+ - Analytics State: Metrics, dashboard data, export history
314
+ - Evaluation State: Quality scores, classifier accuracy, user feedback
315
+ - ML Model Cache: Loaded models for reuse across sessions
316
+ ```
317
+
318
+ **Thread Safety**: All state operations protected by `threading.Lock()`
319
+
320
+ **Cleanup Strategy**:
321
+ - Automatic cleanup every 60 minutes
322
+ - Remove sessions older than 24 hours
323
+ - Prevents memory leaks in long-running deployments
324
+
325
+ ---
326
+
327
+ ### Model Loading & Optimization Strategy
328
+
329
+ #### Three-Stage Loading Pipeline
330
+
331
+ **Stage 1: Build Time (Docker)**
332
+ ```yaml
333
+ # preload_from_hub in README.md
334
+ - Downloads all models during Docker build
335
+ - Cached in ~/.cache/huggingface/hub/
336
+ - No download time at runtime
337
+ ```
338
+
339
+ **Stage 2: Startup (compile_model.py)**
340
+ ```python
341
+ # Runs before Gradio launch
342
+ - Load models from HF cache
343
+ - Apply 4-bit quantization
344
+ - Run warmup inference (CUDA kernel compilation)
345
+ - Create markers for fast path detection
346
+ ```
347
+
348
+ **Stage 3: Runtime (Lazy Loading)**
349
+ ```python
350
+ # First agent call triggers load
351
+ def _load_model(self):
352
+ if self.model_loaded:
353
+ return # Already loaded
354
+ # Load from cache, configure, mark as loaded
355
+ ```
356
+
357
+ **Memory Optimization**:
358
+ - **4-bit Quantization**: 75% memory reduction
359
+ - Mistral-24B: ~24GB β†’ ~6GB VRAM
360
+ - Phi-3-mini: ~3.8GB β†’ ~1GB VRAM
361
+ - **Shared Model Strategy**: RoutingAgents share one Mistral instance (5x memory savings)
362
+ - **Device Mapping**: Automatic distribution across available devices
363
+
364
+ **ZeroGPU Integration**:
365
+ ```python
366
+ @spaces.GPU(duration=60) # Dynamic allocation
367
+ def agent_method(self):
368
+ # GPU available for 60 seconds
369
+ # Automatically released after
370
+ ```
371
+
372
+ ---
373
+
374
+ ### Analytics & Evaluation System
375
+
376
+ #### Built-In Dashboard
377
+ **Real-Time Metrics**:
378
+ * Total conversations
379
+ * Average response time (25-40s typical)
380
+ * Success rate (quality score >3.5)
381
+ * Educational quality scores (ML-evaluated)
382
+ * Classifier accuracy rates
383
+ * Active sessions count
384
+
385
+ **LightEval Integration**:
386
+ * BertScore for semantic quality
387
+ * ROUGE for response completeness
388
+ * Custom educational quality indicators:
389
+ - Has examples
390
+ - Structured explanation
391
+ - Appropriate length
392
+ - Encourages learning
393
+ - Uses LaTeX (for math)
394
+ - Clear sections
395
+
396
+ **Exportable Data**:
397
+ * JSON export with full metrics
398
+ * CSV export of interaction history
399
+ * Programmatic access via API