Spaces:

jdesiree
/

Mimir

Sleeping

App Files Files Community

jdesiree commited on Oct 23, 2025

Commit

89b3776

verified ·

1 Parent(s): d3efb55

Upload README.md

Browse files

Files changed (1) hide show

README.md +399 -0

README.md ADDED Viewed

	@@ -0,0 +1,399 @@

+---
+title: Mimir
+emoji: 📚
+colorFrom: indigo
+colorTo: blue
+sdk: gradio
+sdk_version: 5.49.1
+app_file: app.py
+pinned: true
+python_version: '3.10'
+short_description: Advanced prompt engineering for educational AI systems.
+thumbnail: >-
+  https://cdn-uploads.huggingface.co/production/uploads/68700e7552b74a1dcbb2a87e/Z7P8DJ57rc5P1ozA5gwp3.png
+hardware: zero-gpu-dynamic
+startup_duration_timeout: 3h
+hf_oauth: true
+hf_oauth_expiration_minutes: 180
+preload_from_hub:
+  - meta-llama/Llama-3.2-3B-Instruct
+---
+# Mimir: Educational AI Assistant
+## Advanced Multi-Agent Architecture & Prompt Engineering Portfolio Project
+### Project Overview
+Mimir demonstrates enterprise-grade AI system design through a sophisticated multi-agent architecture applied to educational technology. The system showcases advanced prompt engineering, intelligent decision-making pipelines, and state-persistent conversation management. Unlike simple single-model implementations, Mimir employs **four specialized agent types** working in concert: a tool decision engine, four parallel routing agents for prompt selection, three preprocessing thinking agents for complex reasoning, and a fine-tuned response generator. This architecture prioritizes pedagogical effectiveness through dynamic context assembly, ensuring responses are tailored to each unique educational interaction.
+***
+### Technical Architecture
+**Multi-Agent System:**
+```
+User Input → Tool Decision Agent → Routing Agents (4x) → Thinking Agents (3x) → Response Agent → Output
+```
+**Core Technologies:**
+* **Multi-Model Architecture**: Mistral-Small-24B (24B parameters) for decision-making and reasoning, Phi-3-mini (fine-tuned) for educational response generation, GGUF-quantized Mistral for mathematical tree-of-thought reasoning
+* **Custom Orchestration**: Hand-built agent coordination replacing traditional frameworks for precise control and optimization
+* **State Management**: Thread-safe global state with dual persistence (SQLite + HuggingFace Datasets)
+* **ZeroGPU Integration**: Dynamic GPU allocation with `@spaces.GPU` decorators for efficient resource usage
+* **Gradio**: Multi-page interface (Chatbot + Analytics Dashboard)
+* **Python**: Advanced backend with lazy loading, quantization, and streaming
+**Key Frameworks & Libraries:**
+* `transformers` & `accelerate` for model loading and inference optimization
+* `bitsandbytes` for 4-bit quantization (75% memory reduction)
+* `peft` for Parameter-Efficient Fine-Tuning support
+* `llama-cpp-python` for GGUF model inference
+* `spaces` for HuggingFace ZeroGPU integration
+* `matplotlib` for dynamic visualization generation
+* Custom state management system with SQLite and dataset backup
+***
+### Advanced Agent Architecture
+#### Agent Pipeline Overview
+The system processes each user interaction through a sophisticated four-stage pipeline, with each stage making intelligent decisions that shape the final response.
+#### Stage 1: Tool Decision Agent
+**Purpose**: Determines if visualization tools enhance learning
+**Model**: Mistral-Small-24B (4-bit quantized)
+**Prompt Engineering**:
+* Highly constrained binary decision prompt (YES/NO only)
+* Explicit INCLUDE/EXCLUDE criteria for educational contexts
+* Zero-shot classification with educational domain knowledge
+**Decision Criteria**:
+```
+INCLUDE: Mathematical functions, data analysis, chart interpretation,
+         trend visualization, proportional relationships
+EXCLUDE: Greetings, definitions, explanations without data
+```
+**Output**: Boolean flag activating `TOOL_USE_ENHANCEMENT` prompt segment
+---
+#### Stage 2: Prompt Routing Agents (4 Specialized Agents)
+**Purpose**: Intelligent prompt segment selection through parallel analysis
+**Model**: Shared Mistral-Small-24B instance (memory efficient)
+**Agent Specializations**:
+1. **Agent 1 - Practice Question Detector**
+   - Analyzes conversation context for practice question opportunities
+   - Considers user's expressed understanding and learning progression
+   - Activates: `STRUCTURE_PRACTICE_QUESTIONS`
+2. **Agent 2 - Discovery Mode Classifier**
+   - Dual-classification: vague input detection + understanding assessment
+   - Returns: `VAUGE_INPUT`, `USER_UNDERSTANDING`, or neither
+   - Enables guided discovery and clarification strategies
+3. **Agent 3 - Follow-up Assessment Agent**
+   - Detects if user is responding to previous practice questions
+   - Analyzes conversation history for grading opportunities
+   - Activates: `PRACTICE_QUESTION_FOLLOWUP` (triggers grading mode)
+4. **Agent 4 - Teaching Mode Assessor**
+   - Evaluates need for direct instruction vs. structured practice
+   - Multi-output agent (can activate multiple prompts)
+   - Activates: `GUIDING_TEACHING`, `STRUCTURE_PRACTICE_QUESTIONS`
+**Prompt Engineering Innovation**:
+* Each agent uses a specialized system prompt with clear decision criteria
+* Structured output formats for reliable parsing
+* Context-aware analysis incorporating full conversation history
+* Sequential execution prevents decision conflicts
+---
+#### Stage 3: Thinking Agents (Preprocessing Layer)
+**Purpose**: Generate reasoning context before final response (CoT/ToT)
+**Models**:
+- Standard Mistral-Small-24B (QA Design, General Reasoning)
+- GGUF Mistral (Mathematical Tree-of-Thought)
+**Agent Specializations**:
+1. **Math Thinking Agent (GGUF)**
+   - **Method**: Tree-of-Thought reasoning for mathematical problems
+   - **Activation**: When `LATEX_FORMATTING` is active
+   - **Output Structure**:
+     ```
+     Key Terms → Principles → Formulas → Step-by-Step Solution → Summary
+     ```
+   - **Complexity Routing**: Decision tree determines detail level (1A: basic, 1B: complex)
+2. **Question/Answer Design Agent**
+   - **Method**: Chain-of-Thought for practice question formulation
+   - **Activation**: When `STRUCTURE_PRACTICE_QUESTIONS` is active
+   - **Formatted Inputs**: Tool context, LaTeX guidelines, practice question templates
+   - **Output**: Question design, data formatting, answer bank generation
+3. **Reasoning Thinking Agent**
+   - **Method**: General Chain-of-Thought preprocessing
+   - **Activation**: When tools, follow-ups, or teaching mode active
+   - **Output Structure**:
+     ```
+     User Knowledge Summary → Understanding Analysis →
+     Previous Actions → Reference Fact Sheet
+     ```
+**Prompt Engineering Innovation**:
+* Thinking agents produce **context for ResponseAgent**, not final output
+* Outputs are invisible to user but inform response quality
+* Tree-of-Thought (ToT) for math: explores multiple solution paths
+* Chain-of-Thought (CoT) for others: step-by-step reasoning traces
+---
+#### Stage 4: Response Agent (Educational Response Generation)
+**Purpose**: Generate pedagogically sound final response
+**Model**: Phi-3-mini-4k-instruct (fine-tuned on educational data)
+- **Primary**: `jdesiree/Mimir-Phi-3.5` (fine-tuned)
+- **Fallback**: Microsoft base model (automatic failover)
+**Configuration**:
+* 4-bit quantization (BitsAndBytes NF4)
+* Mixed precision FP16 inference
+* Accelerate integration for distributed computation
+* PEFT-enabled for adapter support
+**Prompt Assembly Process**:
+1. **Core Identity**: Always included (defines Mimir persona)
+2. **Logical Expressions**: Regex-triggered prompts (e.g., math keywords → `LATEX_FORMATTING`)
+3. **Agent-Selected Prompts**: Dynamic assembly based on routing agent decisions
+4. **Context Integration**: Tool outputs, thinking agent outputs, conversation history
+5. **Complete Prompt**: All segments joined with proper formatting
+**Dynamic Prompt Library** (11 segments):
+```
+Core:          CORE_IDENTITY (always)
+Formatting:    GENERAL_FORMATTING (always), LATEX_FORMATTING (math)
+Discovery:     VAUGE_INPUT, USER_UNDERSTANDING
+Teaching:      GUIDING_TEACHING
+Practice:      STRUCTURE_PRACTICE_QUESTIONS, PRACTICE_QUESTION_FOLLOWUP
+Tool:          TOOL_USE_ENHANCEMENT
+```
+**Response Post-Processing**:
+* Artifact cleanup (remove `<|end|>`, `###`, etc.)
+* Intelligent truncation at logical breakpoints
+* Sentence integrity preservation
+* Quality validation gates
+* Word-by-word streaming for UX
+---
+### Prompt Engineering Techniques Demonstrated
+#### 1. Hierarchical Prompt Architecture
+**Three-Layer System**:
+- **Agent System Prompts**: Specialized instructions for each agent type
+- **Response Prompt Segments**: Modular components dynamically assembled
+- **Thinking Prompts**: Preprocessing templates for reasoning generation
+**Innovation**: Separates decision-making logic from response generation, enabling precise control over AI behavior at each pipeline stage.
+#### 2. Per-Turn Prompt State Management
+**PromptStateManager**:
+```python
+# Reset at turn start - clean slate
+prompt_state.reset()  # All 11 prompts → False
+# Agents activate relevant prompts
+prompt_state.update("LATEX_FORMATTING", True)
+prompt_state.update("GUIDING_TEACHING", True)
+# Assemble only active prompts
+active_prompts = prompt_state.get_active_response_prompts()
+# Returns: ["CORE_IDENTITY", "GENERAL_FORMATTING",
+#           "LATEX_FORMATTING", "GUIDING_TEACHING"]
+```
+**Benefits**:
+- No prompt pollution between turns
+- Context-appropriate responses every time
+- Traceable decision-making for debugging
+#### 3. Logical Expression System
+**Regex-Based Automatic Activation**:
+```python
+# Math keyword detection
+math_regex = r'\b(calculus|algebra|equation|solve|derivative)\b'
+if re.search(math_regex, user_input, re.IGNORECASE):
+    prompt_state.update("LATEX_FORMATTING", True)
+```
+**Hybrid Approach**: Combines rule-based triggers with LLM decision-making for optimal reliability.
+#### 4. Constraint-Based Agent Prompting
+**Tool Decision Example**:
+```
+System Prompt: Analyze query and determine if visualization needed.
+Output Format: YES or NO (nothing else)
+INCLUDE if: mathematical functions, data analysis, trends
+EXCLUDE if: greetings, simple definitions, no data
+```
+**Result**: Reliable, parseable outputs from agents without complex post-processing.
+#### 5. Chain-of-Thought & Tree-of-Thought Preprocessing
+**CoT for Sequential Reasoning**:
+```
+Step 1: Assess topic →
+Step 2: Identify user understanding →
+Step 3: Previous actions →
+Step 4: Reference facts
+```
+**ToT for Mathematical Reasoning**:
+```
+Question Type Assessment →
+  Branch 1A (Simple): Minimal steps
+  Branch 1B (Complex): Full derivation with principles
+```
+**Innovation**: Thinking agents generate rich context that guides ResponseAgent to higher-quality outputs.
+#### 6. Academic Integrity by Design
+**Embedded in Core Prompts**:
+* "Do not provide full solutions - guide through processes instead"
+* "Break problems into conceptual components"
+* "Ask clarifying questions about their understanding"
+* Subject-specific guidelines (Math: explain concepts, not compute)
+**Follow-up Grading**:
+* Agent 3 detects practice question responses
+* `PRACTICE_QUESTION_FOLLOWUP` prompt activates
+* Automated assessment with constructive feedback
+#### 7. Multi-Modal Response Generation
+**Tool Integration**:
+```python
+# Tool decision → JSON generation → matplotlib rendering → base64 encoding
+Create_Graph_Tool(
+    data={"Week 1": 120, "Week 2": 155, ...},
+    plot_type="line",
+    title="Crop Yield Analysis",
+    educational_context="Visualizes growth trend over time"
+)
+```
+**Result**: In-memory graph generation with educational context, embedded directly in response.
+---
+### State Management & Persistence
+#### GlobalStateManager Architecture
+**Dual-Layer Persistence**:
+1. **SQLite Database**: Fast local access, immediate writes
+2. **HuggingFace Dataset**: Cloud backup, hourly sync
+**State Categories**:
+```python
+- Conversation State: Full chat history + agent context
+- Prompt State: Per-turn activation (resets each interaction)
+- Analytics State: Metrics, dashboard data, export history
+- Evaluation State: Quality scores, classifier accuracy, user feedback
+- ML Model Cache: Loaded models for reuse across sessions
+```
+**Thread Safety**: All state operations protected by `threading.Lock()`
+**Cleanup Strategy**:
+- Automatic cleanup every 60 minutes
+- Remove sessions older than 24 hours
+- Prevents memory leaks in long-running deployments
+---
+### Model Loading & Optimization Strategy
+#### Three-Stage Loading Pipeline
+**Stage 1: Build Time (Docker)**
+```yaml
+# preload_from_hub in README.md
+- Downloads all models during Docker build
+- Cached in ~/.cache/huggingface/hub/
+- No download time at runtime
+```
+**Stage 2: Startup (compile_model.py)**
+```python
+# Runs before Gradio launch
+- Load models from HF cache
+- Apply 4-bit quantization
+- Run warmup inference (CUDA kernel compilation)
+- Create markers for fast path detection
+```
+**Stage 3: Runtime (Lazy Loading)**
+```python
+# First agent call triggers load
+def _load_model(self):
+    if self.model_loaded:
+        return  # Already loaded
+    # Load from cache, configure, mark as loaded
+```
+**Memory Optimization**:
+- **4-bit Quantization**: 75% memory reduction
+  - Mistral-24B: ~24GB → ~6GB VRAM
+  - Phi-3-mini: ~3.8GB → ~1GB VRAM
+- **Shared Model Strategy**: RoutingAgents share one Mistral instance (5x memory savings)
+- **Device Mapping**: Automatic distribution across available devices
+**ZeroGPU Integration**:
+```python
+@spaces.GPU(duration=60)  # Dynamic allocation
+def agent_method(self):
+    # GPU available for 60 seconds
+    # Automatically released after
+```
+---
+### Analytics & Evaluation System
+#### Built-In Dashboard
+**Real-Time Metrics**:
+* Total conversations
+* Average response time (25-40s typical)
+* Success rate (quality score >3.5)
+* Educational quality scores (ML-evaluated)
+* Classifier accuracy rates
+* Active sessions count
+**LightEval Integration**:
+* BertScore for semantic quality
+* ROUGE for response completeness
+* Custom educational quality indicators:
+  - Has examples
+  - Structured explanation
+  - Appropriate length
+  - Encourages learning
+  - Uses LaTeX (for math)
+  - Clear sections
+**Exportable Data**:
+* JSON export with full metrics
+* CSV export of interaction history
+* Programmatic access via API