File size: 7,443 Bytes
cc5c775
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
# 🌊 Streaming System - Real-Time AI Updates

## Overview

The PyCatan AI system now supports **real-time streaming** of AI agent thoughts, actions, and tool calls! This provides immediate visibility into what the AI is thinking and doing as it plays.

## Architecture

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚             β”‚ Stream  β”‚              β”‚  SSE    β”‚             β”‚
β”‚  LLM Client β”œβ”€β”€β”€β”€β”€β”€β”€β”€β–Ίβ”‚  AI Manager  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β–Ίβ”‚ Web Viewer  β”‚
β”‚             β”‚ Chunks  β”‚              β”‚  Events β”‚             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β”‚ HTTP POST
                              β–Ό
                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚Stream        β”‚
                        β”‚Broadcaster   β”‚
                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## Components

### 1. LLM Client (`llm_client.py`)

**New:** `generate_stream()` method
- Uses `client.models.generate_content_stream()` for streaming
- Yields `StreamChunk` objects in real-time
- Supports `include_thoughts=True` in ThinkingConfig
- Handles three chunk types:
  - `thought` - AI reasoning/thinking
  - `text` - Regular response text
  - `function_call` - Tool/function calls

**StreamChunk dataclass:**
```python
@dataclass
class StreamChunk:
    chunk_type: str  # 'thought', 'text', 'function_call', 'done'
    content: Optional[str] = None
    function_call: Optional[Dict[str, Any]] = None
    is_complete: bool = False
```

### 2. AI Manager (`ai_manager.py`)

**New:** `_send_to_llm_stream()` method
- Similar to `_send_to_llm()` but uses streaming
- Broadcasts chunks via `_broadcast_stream_chunk()`
- Supports tool calling loop with streaming
- Each iteration can stream thoughts and tool calls

**Configuration:**
- `config.llm.enable_streaming` - Enable/disable streaming (default: True)
- Falls back to regular mode if disabled

### 3. Stream Broadcaster (`stream_broadcaster.py`)

**New component** that pushes events to web viewer:
- Sends HTTP POST to `http://localhost:5001/api/stream/broadcast`
- Non-blocking with short timeout (0.5s)
- Automatically disables if web viewer not available
- Converts StreamChunk β†’ JSON event

### 4. Web Viewer (`web_viewer.py`)

**New endpoints:**

**`GET /api/stream/<player_name>`** - SSE endpoint
- Returns Server-Sent Events stream
- Clients connect and receive real-time updates
- Sends keepalive pings every 30s
- Auto-reconnects on error

**`POST /api/stream/broadcast`** - Broadcast endpoint
- Receives events from AI Manager
- Pushes to player-specific queue
- Queue is non-blocking (max 1000 events)

**Event format:**
```json
{
  "type": "thought|text|function_call|done",
  "timestamp": "ISO-8601",
  "content": "...",
  "function_call": {...}
}
```

### 5. Dynamic Viewer UI (`viewer_dynamic.html`)

**New features:**

**Streaming Container** - Shows live updates:
- Appears at top of page when streaming active
- Shows player name with blinking indicator
- Auto-scrolls as new chunks arrive
- Fades out after completion

**Visual feedback:**
- πŸ’­ Purple border for thoughts
- πŸ”Ή Green border for text
- πŸ”§ Orange border for function calls
- βœ… Done status with green indicator

**JavaScript functions:**
- `initStreaming()` - Connect to SSE for all players
- `connectPlayerStream(player)` - Create EventSource
- `handleStreamChunk(player, chunk)` - Process incoming chunk
- `addStreamChunk(container, type, content)` - Display chunk

## Configuration

### Enable Streaming

In `config_dev.yaml`:
```yaml
llm:
  enable_streaming: true  # Enable real-time streaming
  enable_thinking: true   # Required for thought summaries
  thinking_budget: 8000   # Budget for thinking tokens
```

### Disable Streaming

Set `enable_streaming: false` to use traditional request-response mode.

## Usage

### 1. Start the Game

Run `play_ai_auto.bat` which starts:
- Web Viewer on port 5001 (with SSE support)
- Game with AI agents
- LLM Logger console

### 2. Watch Real-Time Updates

Open browser to `http://localhost:5001`:
- Streaming boxes appear when AI is thinking
- See thoughts, tool calls, and responses as they happen
- Boxes disappear when complete

### 3. Review History

Completed requests are logged normally:
- Full prompt/response saved
- Tool iterations recorded
- All metadata preserved

## Technical Details

### Why SSE (Server-Sent Events)?

- One-way: Server β†’ Client (perfect for our use case)
- Built-in reconnection
- Simple HTTP (no WebSocket complexity)
- Works with existing Flask app

### Why HTTP POST for Broadcasting?

- Decoupled architecture
- AI Manager doesn't need to know about SSE
- Non-blocking (fire and forget)
- Web viewer can be offline without breaking AI

### Token Budgets with Streaming

Streaming works with thinking budgets:
```yaml
# Single budget for all iterations
thinking_budget: 8000
thinking_budgets: []

# OR: Dynamic budgets per iteration
thinking_budgets: [8000, 4000, 2000]  # 3 iterations
```

Each iteration streams its own thoughts and results.

## Benefits

### For Development
- **Immediate feedback** - See what AI is doing in real-time
- **Debug tool calls** - Watch function calling decisions
- **Monitor thinking** - Understand reasoning process
- **Better UX** - Know the system is working

### For Users
- **Transparency** - See AI decision-making
- **Engagement** - Watch the game unfold
- **Understanding** - Learn how AI plays Catan
- **Entertainment** - More interesting than waiting

## Future Enhancements

Possible additions:
- [ ] Stream to multiple viewers simultaneously
- [ ] Replay streaming for historical games
- [ ] Filter streams by type (thoughts only, tools only)
- [ ] Stream game state updates
- [ ] WebSocket option for bidirectional communication
- [ ] Stream compression for high-frequency updates

## Troubleshooting

**No streaming visible:**
- Check `enable_streaming: true` in config
- Verify web viewer is running on port 5001
- Check browser console for connection errors
- Ensure `enable_thinking: true` for thought summaries

**Connection drops:**
- SSE reconnects automatically after 5s
- Check network/firewall
- Verify Flask not blocking long connections

**Missing chunks:**
- Queue size is 1000 - may drop old events
- Increase queue size in `web_viewer.py` if needed

## API Reference

### StreamChunk
```python
chunk = StreamChunk(
    chunk_type='thought',  # or 'text', 'function_call', 'done'
    content='Analyzing situation...',
    is_complete=False
)
```

### SSE Event
```javascript
{
  type: 'thought',
  timestamp: '2026-01-10T12:34:56',
  content: 'I should build a settlement...'
}
```

### Broadcast API
```bash
POST http://localhost:5001/api/stream/broadcast
Content-Type: application/json

{
  "player_name": "Agent1",
  "chunk_type": "thought",
  "content": "Thinking..."
}
```

## Credits

Built on top of:
- **Google Gemini API** - Streaming support with thinking mode
- **Flask** - SSE server
- **Server-Sent Events** - Real-time browser updates
- **PyCatan** - Settlers of Catan implementation

---

**Happy Streaming! 🌊**