agentic-comic-generator / memory_handling.md
ramsi-k's picture
docs: update and add memory handling and tech specs
7a28b51
# Memory Handling for Bayko & Brown
## Hackathon Implementation Guide
> 🎯 **Simple, real, shippable memory and evaluation for multi-agent comic generation**
---
## 🧠 LlamaIndex Memory Integration
### Real Memory Class (Based on LlamaIndex Docs)
```python
# services/agent_memory.py
from llama_index.core.memory import Memory
from llama_index.core.llms import ChatMessage
class AgentMemory:
"""Simple wrapper around LlamaIndex Memory for agent conversations"""
def __init__(self, session_id: str, agent_name: str):
self.session_id = session_id
self.agent_name = agent_name
# Use LlamaIndex Memory with session-specific ID
self.memory = Memory.from_defaults(
session_id=f"{session_id}_{agent_name}",
token_limit=4000
)
def add_message(self, role: str, content: str):
"""Add a message to memory"""
message = ChatMessage(role=role, content=content)
self.memory.put_messages([message])
def get_history(self):
"""Get conversation history"""
return self.memory.get()
def clear(self):
"""Clear memory for new session"""
self.memory.reset()
```
### Integration with Existing Agents
**Update Brown's memory (api/agents/brown.py):**
```python
# Replace the LlamaIndexMemoryStub with real memory
from services.agent_memory import AgentMemory
class AgentBrown:
def __init__(self, max_iterations: int = 3):
self.max_iterations = max_iterations
self.session_id = None
self.iteration_count = 0
# Real LlamaIndex memory
self.memory = None # Initialize when session starts
# ... rest of existing code
def process_request(self, request: StoryboardRequest):
# Initialize memory for new session
self.session_id = f"session_{uuid.uuid4().hex[:8]}"
self.memory = AgentMemory(self.session_id, "brown")
# Log user request
self.memory.add_message("user", request.prompt)
# ... existing validation and processing logic
# Log Brown's decision
self.memory.add_message("assistant", f"Created generation request for Bayko")
return message
```
**Update Bayko's memory (api/agents/bayko.py):**
```python
# Add memory to Bayko
from services.agent_memory import AgentMemory
class AgentBayko:
def __init__(self):
# ... existing initialization
self.memory = None # Initialize when processing starts
async def process_generation_request(self, message: Dict[str, Any]):
session_id = message.get("context", {}).get("session_id")
self.memory = AgentMemory(session_id, "bayko")
# Log received request
self.memory.add_message("user", f"Received generation request: {message['payload']['prompt']}")
# ... existing generation logic
# Log completion
self.memory.add_message("assistant", f"Generated {len(panels)} panels successfully")
return result
```
### Optional: Sync with SQLite
```python
# services/memory_sync.py
from services.turn_memory import AgentMemory as SQLiteMemory
from services.agent_memory import AgentMemory as LlamaMemory
def sync_to_sqlite(llama_memory: LlamaMemory, sqlite_memory: SQLiteMemory):
"""Sync LlamaIndex memory to SQLite for persistence"""
history = llama_memory.get_history()
for message in history:
sqlite_memory.add_message(
session_id=llama_memory.session_id,
agent_name=llama_memory.agent_name,
content=message.content,
step_type="message"
)
```
---
## βœ… Simple Evaluation Logic
### Basic Evaluator Class
```python
# services/simple_evaluator.py
class SimpleEvaluator:
"""Basic evaluation logic for Brown's decision making"""
MAX_ATTEMPTS = 3 # Original + 2 revisions
def __init__(self):
self.attempt_count = 0
def evaluate(self, bayko_output: dict, original_prompt: str) -> dict:
"""Evaluate Bayko's output and decide: approve, reject, or refine"""
self.attempt_count += 1
print(f"πŸ” Brown evaluating attempt {self.attempt_count}/{self.MAX_ATTEMPTS}")
# Rule 1: Auto-reject if dialogue in images
if self._has_dialogue_in_images(bayko_output):
return {
"decision": "reject",
"reason": "Images contain dialogue text - use subtitles instead",
"final": True
}
# Rule 2: Auto-reject if story is incoherent
if not self._is_story_coherent(bayko_output):
return {
"decision": "reject",
"reason": "Story panels don't follow logical sequence",
"final": True
}
# Rule 3: Force approve if max attempts reached
if self.attempt_count >= self.MAX_ATTEMPTS:
return {
"decision": "approve",
"reason": f"Max attempts ({self.MAX_ATTEMPTS}) reached - accepting current quality",
"final": True
}
# Rule 4: Check if output matches prompt intent
if self._matches_prompt_intent(bayko_output, original_prompt):
return {
"decision": "approve",
"reason": "Output matches prompt and quality is acceptable",
"final": True
}
else:
return {
"decision": "refine",
"reason": "Output needs improvement to better match prompt",
"final": False
}
def _has_dialogue_in_images(self, output: dict) -> bool:
"""Check if panels mention dialogue in the image"""
panels = output.get("panels", [])
dialogue_keywords = [
"speech bubble", "dialogue", "talking", "saying",
"text in image", "speech", "conversation"
]
for panel in panels:
description = panel.get("description", "").lower()
if any(keyword in description for keyword in dialogue_keywords):
print(f"❌ Found dialogue in image: {description}")
return True
return False
def _is_story_coherent(self, output: dict) -> bool:
"""Basic check for story coherence"""
panels = output.get("panels", [])
if len(panels) < 2:
return True # Single panel is always coherent
# Check 1: All panels should have descriptions
descriptions = [p.get("description", "") for p in panels]
if any(not desc.strip() for desc in descriptions):
print("❌ Some panels missing descriptions")
return False
# Check 2: Panels shouldn't be identical (no progression)
if len(set(descriptions)) == 1:
print("❌ All panels are identical - no story progression")
return False
# Check 3: Look for obvious incoherence keywords
incoherent_keywords = [
"unrelated", "random", "doesn't make sense",
"no connection", "contradictory"
]
full_text = " ".join(descriptions).lower()
if any(keyword in full_text for keyword in incoherent_keywords):
print("❌ Story contains incoherent elements")
return False
return True
def _matches_prompt_intent(self, output: dict, prompt: str) -> bool:
"""Check if output generally matches the original prompt"""
panels = output.get("panels", [])
if not panels:
return False
# Simple keyword matching
prompt_words = set(prompt.lower().split())
panel_text = " ".join([p.get("description", "") for p in panels]).lower()
panel_words = set(panel_text.split())
# At least 20% of prompt words should appear in panel descriptions
overlap = len(prompt_words.intersection(panel_words))
match_ratio = overlap / len(prompt_words) if prompt_words else 0
print(f"πŸ“Š Prompt match ratio: {match_ratio:.2f}")
return match_ratio >= 0.2
def reset(self):
"""Reset for new session"""
self.attempt_count = 0
```
### Integration with Brown
```python
# Update Brown's review_output method
from services.simple_evaluator import SimpleEvaluator
class AgentBrown:
def __init__(self, max_iterations: int = 3):
# ... existing code
self.evaluator = SimpleEvaluator()
def review_output(self, bayko_response: Dict[str, Any], original_request: StoryboardRequest):
"""Review Bayko's output using simple evaluation logic"""
print(f"πŸ€– Brown reviewing Bayko's output...")
# Use simple evaluator
evaluation = self.evaluator.evaluate(
bayko_response,
original_request.prompt
)
# Log to memory
self.memory.add_message(
"assistant",
f"Evaluation: {evaluation['decision']} - {evaluation['reason']}"
)
if evaluation["decision"] == "approve":
print(f"βœ… Brown approved: {evaluation['reason']}")
return self._create_approval_message(bayko_response, evaluation)
elif evaluation["decision"] == "reject":
print(f"❌ Brown rejected: {evaluation['reason']}")
return self._create_rejection_message(bayko_response, evaluation)
else: # refine
print(f"πŸ”„ Brown requesting refinement: {evaluation['reason']}")
return self._create_refinement_message(bayko_response, evaluation)
```
---
## πŸš€ Implementation Steps
### Day 1: Memory Integration
1. **Install LlamaIndex**: `pip install llama-index`
2. **Create `services/agent_memory.py`** with the Memory wrapper above
3. **Update Brown and Bayko** to use real memory instead of stubs
4. **Test**: Verify agents can store and retrieve conversation history
### Day 2: Evaluation Logic
1. **Create `services/simple_evaluator.py`** with the evaluation class above
2. **Update Brown's `review_output` method** to use SimpleEvaluator
3. **Test**: Verify 3-attempt limit and rejection rules work
4. **Optional**: Add memory sync to SQLite for persistence
### Day 3: Testing & Polish
1. **End-to-end testing** with various prompts
2. **Console logging** to show evaluation decisions
3. **Bug fixes** and edge case handling
4. **Demo preparation**
---
## πŸ“‹ Success Criteria
- [ ] **Memory Works**: Agents store multi-turn conversations using LlamaIndex
- [ ] **Evaluation Works**: Brown makes approve/reject/refine decisions
- [ ] **3-Attempt Limit**: System stops after original + 2 revisions
- [ ] **Auto-Rejection**: Dialogue-in-images and incoherent stories are rejected
- [ ] **End-to-End**: Complete user prompt β†’ comic generation β†’ evaluation cycle
---
_Simple, real, shippable. Perfect for a hackathon demo._