File size: 10,851 Bytes
7a28b51
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
# Memory Handling for Bayko & Brown

## Hackathon Implementation Guide

> 🎯 **Simple, real, shippable memory and evaluation for multi-agent comic generation**

---

## 🧠 LlamaIndex Memory Integration

### Real Memory Class (Based on LlamaIndex Docs)

```python
# services/agent_memory.py
from llama_index.core.memory import Memory
from llama_index.core.llms import ChatMessage

class AgentMemory:
    """Simple wrapper around LlamaIndex Memory for agent conversations"""

    def __init__(self, session_id: str, agent_name: str):
        self.session_id = session_id
        self.agent_name = agent_name

        # Use LlamaIndex Memory with session-specific ID
        self.memory = Memory.from_defaults(
            session_id=f"{session_id}_{agent_name}",
            token_limit=4000
        )

    def add_message(self, role: str, content: str):
        """Add a message to memory"""
        message = ChatMessage(role=role, content=content)
        self.memory.put_messages([message])

    def get_history(self):
        """Get conversation history"""
        return self.memory.get()

    def clear(self):
        """Clear memory for new session"""
        self.memory.reset()
```

### Integration with Existing Agents

**Update Brown's memory (api/agents/brown.py):**

```python
# Replace the LlamaIndexMemoryStub with real memory
from services.agent_memory import AgentMemory

class AgentBrown:
    def __init__(self, max_iterations: int = 3):
        self.max_iterations = max_iterations
        self.session_id = None
        self.iteration_count = 0

        # Real LlamaIndex memory
        self.memory = None  # Initialize when session starts

        # ... rest of existing code

    def process_request(self, request: StoryboardRequest):
        # Initialize memory for new session
        self.session_id = f"session_{uuid.uuid4().hex[:8]}"
        self.memory = AgentMemory(self.session_id, "brown")

        # Log user request
        self.memory.add_message("user", request.prompt)

        # ... existing validation and processing logic

        # Log Brown's decision
        self.memory.add_message("assistant", f"Created generation request for Bayko")

        return message
```

**Update Bayko's memory (api/agents/bayko.py):**

```python
# Add memory to Bayko
from services.agent_memory import AgentMemory

class AgentBayko:
    def __init__(self):
        # ... existing initialization
        self.memory = None  # Initialize when processing starts

    async def process_generation_request(self, message: Dict[str, Any]):
        session_id = message.get("context", {}).get("session_id")
        self.memory = AgentMemory(session_id, "bayko")

        # Log received request
        self.memory.add_message("user", f"Received generation request: {message['payload']['prompt']}")

        # ... existing generation logic

        # Log completion
        self.memory.add_message("assistant", f"Generated {len(panels)} panels successfully")

        return result
```

### Optional: Sync with SQLite

```python
# services/memory_sync.py
from services.turn_memory import AgentMemory as SQLiteMemory
from services.agent_memory import AgentMemory as LlamaMemory

def sync_to_sqlite(llama_memory: LlamaMemory, sqlite_memory: SQLiteMemory):
    """Sync LlamaIndex memory to SQLite for persistence"""
    history = llama_memory.get_history()

    for message in history:
        sqlite_memory.add_message(
            session_id=llama_memory.session_id,
            agent_name=llama_memory.agent_name,
            content=message.content,
            step_type="message"
        )
```

---

## βœ… Simple Evaluation Logic

### Basic Evaluator Class

```python
# services/simple_evaluator.py

class SimpleEvaluator:
    """Basic evaluation logic for Brown's decision making"""

    MAX_ATTEMPTS = 3  # Original + 2 revisions

    def __init__(self):
        self.attempt_count = 0

    def evaluate(self, bayko_output: dict, original_prompt: str) -> dict:
        """Evaluate Bayko's output and decide: approve, reject, or refine"""
        self.attempt_count += 1

        print(f"πŸ” Brown evaluating attempt {self.attempt_count}/{self.MAX_ATTEMPTS}")

        # Rule 1: Auto-reject if dialogue in images
        if self._has_dialogue_in_images(bayko_output):
            return {
                "decision": "reject",
                "reason": "Images contain dialogue text - use subtitles instead",
                "final": True
            }

        # Rule 2: Auto-reject if story is incoherent
        if not self._is_story_coherent(bayko_output):
            return {
                "decision": "reject",
                "reason": "Story panels don't follow logical sequence",
                "final": True
            }

        # Rule 3: Force approve if max attempts reached
        if self.attempt_count >= self.MAX_ATTEMPTS:
            return {
                "decision": "approve",
                "reason": f"Max attempts ({self.MAX_ATTEMPTS}) reached - accepting current quality",
                "final": True
            }

        # Rule 4: Check if output matches prompt intent
        if self._matches_prompt_intent(bayko_output, original_prompt):
            return {
                "decision": "approve",
                "reason": "Output matches prompt and quality is acceptable",
                "final": True
            }
        else:
            return {
                "decision": "refine",
                "reason": "Output needs improvement to better match prompt",
                "final": False
            }

    def _has_dialogue_in_images(self, output: dict) -> bool:
        """Check if panels mention dialogue in the image"""
        panels = output.get("panels", [])

        dialogue_keywords = [
            "speech bubble", "dialogue", "talking", "saying",
            "text in image", "speech", "conversation"
        ]

        for panel in panels:
            description = panel.get("description", "").lower()
            if any(keyword in description for keyword in dialogue_keywords):
                print(f"❌ Found dialogue in image: {description}")
                return True

        return False

    def _is_story_coherent(self, output: dict) -> bool:
        """Basic check for story coherence"""
        panels = output.get("panels", [])

        if len(panels) < 2:
            return True  # Single panel is always coherent

        # Check 1: All panels should have descriptions
        descriptions = [p.get("description", "") for p in panels]
        if any(not desc.strip() for desc in descriptions):
            print("❌ Some panels missing descriptions")
            return False

        # Check 2: Panels shouldn't be identical (no progression)
        if len(set(descriptions)) == 1:
            print("❌ All panels are identical - no story progression")
            return False

        # Check 3: Look for obvious incoherence keywords
        incoherent_keywords = [
            "unrelated", "random", "doesn't make sense",
            "no connection", "contradictory"
        ]

        full_text = " ".join(descriptions).lower()
        if any(keyword in full_text for keyword in incoherent_keywords):
            print("❌ Story contains incoherent elements")
            return False

        return True

    def _matches_prompt_intent(self, output: dict, prompt: str) -> bool:
        """Check if output generally matches the original prompt"""
        panels = output.get("panels", [])

        if not panels:
            return False

        # Simple keyword matching
        prompt_words = set(prompt.lower().split())
        panel_text = " ".join([p.get("description", "") for p in panels]).lower()
        panel_words = set(panel_text.split())

        # At least 20% of prompt words should appear in panel descriptions
        overlap = len(prompt_words.intersection(panel_words))
        match_ratio = overlap / len(prompt_words) if prompt_words else 0

        print(f"πŸ“Š Prompt match ratio: {match_ratio:.2f}")
        return match_ratio >= 0.2

    def reset(self):
        """Reset for new session"""
        self.attempt_count = 0
```

### Integration with Brown

```python
# Update Brown's review_output method
from services.simple_evaluator import SimpleEvaluator

class AgentBrown:
    def __init__(self, max_iterations: int = 3):
        # ... existing code
        self.evaluator = SimpleEvaluator()

    def review_output(self, bayko_response: Dict[str, Any], original_request: StoryboardRequest):
        """Review Bayko's output using simple evaluation logic"""

        print(f"πŸ€– Brown reviewing Bayko's output...")

        # Use simple evaluator
        evaluation = self.evaluator.evaluate(
            bayko_response,
            original_request.prompt
        )

        # Log to memory
        self.memory.add_message(
            "assistant",
            f"Evaluation: {evaluation['decision']} - {evaluation['reason']}"
        )

        if evaluation["decision"] == "approve":
            print(f"βœ… Brown approved: {evaluation['reason']}")
            return self._create_approval_message(bayko_response, evaluation)

        elif evaluation["decision"] == "reject":
            print(f"❌ Brown rejected: {evaluation['reason']}")
            return self._create_rejection_message(bayko_response, evaluation)

        else:  # refine
            print(f"πŸ”„ Brown requesting refinement: {evaluation['reason']}")
            return self._create_refinement_message(bayko_response, evaluation)
```

---

## πŸš€ Implementation Steps

### Day 1: Memory Integration

1. **Install LlamaIndex**: `pip install llama-index`
2. **Create `services/agent_memory.py`** with the Memory wrapper above
3. **Update Brown and Bayko** to use real memory instead of stubs
4. **Test**: Verify agents can store and retrieve conversation history

### Day 2: Evaluation Logic

1. **Create `services/simple_evaluator.py`** with the evaluation class above
2. **Update Brown's `review_output` method** to use SimpleEvaluator
3. **Test**: Verify 3-attempt limit and rejection rules work
4. **Optional**: Add memory sync to SQLite for persistence

### Day 3: Testing & Polish

1. **End-to-end testing** with various prompts
2. **Console logging** to show evaluation decisions
3. **Bug fixes** and edge case handling
4. **Demo preparation**

---

## πŸ“‹ Success Criteria

- [ ] **Memory Works**: Agents store multi-turn conversations using LlamaIndex
- [ ] **Evaluation Works**: Brown makes approve/reject/refine decisions
- [ ] **3-Attempt Limit**: System stops after original + 2 revisions
- [ ] **Auto-Rejection**: Dialogue-in-images and incoherent stories are rejected
- [ ] **End-to-End**: Complete user prompt β†’ comic generation β†’ evaluation cycle

---

_Simple, real, shippable. Perfect for a hackathon demo._