import sys import json def generate_plan(): print("Sequential Thinking Phase:") print("1. Analyze the request: The user wants to add a 'vision' tab to the chat UI.") print(" - Currently: Chat UI exists for text interaction.") print(" - Goal: Add a second tab for 'vision', streaming screenshots/VNC/visual data seamlessly.") print(" - The process must keep running seamlessly while switching tabs.") print("2. Architecture & Functionality:") print(" - Frontend: Update ChatView to have tabs (Chat | Vision).") print(" - 'Vision' tab needs a component to display streamed images/VNC.") print(" - The WebSocket connection in `app.py`/`ws.py` currently handles run streams.") print(" - We need a way to receive visual data (base64 images or VNC URL) via WebSocket or a separate endpoint.") print(" - Given `FaraWebSurfer` and `VncDockerPlaywrightBrowser`, visual data might already be captured or accessible.") print(" - Backend:") print(" - FastAPI handles WebSocket for runs. Need to ensure visual data from `FaraWebSurfer` or similar agents is sent over WS.") print(" - Or expose an endpoint to fetch the latest screenshot / VNC stream for a run.") print(" - VNC streaming could be an iframe to a noVNC instance if the docker container exposes it.") print("3. APIs & Integrations:") print(" - Backend: Update WS manager to broadcast screenshots if agents yield them.") print(" - Frontend: Listen to WS messages of type 'screenshot' or 'visual_data', and update the Vision tab.") print(" - Alternatively, an endpoint `/api/runs/{run_id}/vision` could return the VNC URL or latest screenshot.") print("4. Iteration:") print(" - The simplest robust approach for 'streaming the screenshots, or no vnc' is:") print(" a) Frontend: Add Tabs to Chat UI (Tabs: Chat, Vision).") print(" b) Vision Tab: If VNC is available, show an iframe to the VNC URL. If screenshots are streaming, show an image tag updated via WS.") print(" c) Backend: Define an API to get the vision stream info for a session/run.") return """ 1. **Project Description** - Vision: Add a seamlessly integrated 'Vision' tab in the Magentic-UI to observe agents' visual interactions (screenshots or VNC) in real-time. - Integration: The frontend `ChatView` will be updated to include tabs (Chat / Vision). The Vision tab will subscribe to visual data via WebSocket or display a VNC iframe. - FastAPI Setup: - Use existing `/api/ws/runs/{run_id}` for streaming screenshot events, or add `/api/runs/{run_id}/vision` to get VNC connection details. 2. **Tasks and Tests** - Task 1 (Backend): Expose VNC/Vision info endpoint. - Modify `src/magentic_ui/backend/web/routes/runs.py` to add a `GET /runs/{run_id}/vision` endpoint returning VNC URL or stream status. - Test: Add a unit test in `tests/` checking if the endpoint returns valid connection info. - Task 2 (Backend WS): Broadcast screenshots. - Update WebSocketManager in `src/magentic_ui/backend/web/managers/websocket.py` to relay `screenshot` type messages from agents like `FaraWebSurfer`. - Test: Unit test the WebSocketManager to ensure `screenshot` messages are broadcasted properly. - Task 3 (Frontend): Implement Tabs in Chat UI. - Update `frontend/src/components/views/chat/chat.tsx` to wrap the chat interface in an Ant Design `` component (Chat vs Vision). - Test: Write a Playwright test ensuring the Tabs render and clicking 'Vision' switches the view. - Task 4 (Frontend): Implement Vision Component. - Create `frontend/src/components/views/chat/vision.tsx` to render an `