Spaces:

AUXteam
/

Maxun

Paused

App Files Files Community

Maxun / plan.py

AUXteam

Upload folder using huggingface_hub

137ee57 verified about 1 month ago

raw

history blame contribute delete

4.9 kB

	import sys
	import json

	def generate_plan():
	print("Sequential Thinking Phase:")
	print("1. Analyze the request: The user wants to add a 'vision' tab to the chat UI.")
	print(" - Currently: Chat UI exists for text interaction.")
	print(" - Goal: Add a second tab for 'vision', streaming screenshots/VNC/visual data seamlessly.")
	print(" - The process must keep running seamlessly while switching tabs.")
	print("2. Architecture & Functionality:")
	print(" - Frontend: Update ChatView to have tabs (Chat \| Vision).")
	print(" - 'Vision' tab needs a component to display streamed images/VNC.")
	print(" - The WebSocket connection in `app.py`/`ws.py` currently handles run streams.")
	print(" - We need a way to receive visual data (base64 images or VNC URL) via WebSocket or a separate endpoint.")
	print(" - Given `FaraWebSurfer` and `VncDockerPlaywrightBrowser`, visual data might already be captured or accessible.")
	print(" - Backend:")
	print(" - FastAPI handles WebSocket for runs. Need to ensure visual data from `FaraWebSurfer` or similar agents is sent over WS.")
	print(" - Or expose an endpoint to fetch the latest screenshot / VNC stream for a run.")
	print(" - VNC streaming could be an iframe to a noVNC instance if the docker container exposes it.")
	print("3. APIs & Integrations:")
	print(" - Backend: Update WS manager to broadcast screenshots if agents yield them.")
	print(" - Frontend: Listen to WS messages of type 'screenshot' or 'visual_data', and update the Vision tab.")
	print(" - Alternatively, an endpoint `/api/runs/{run_id}/vision` could return the VNC URL or latest screenshot.")
	print("4. Iteration:")
	print(" - The simplest robust approach for 'streaming the screenshots, or no vnc' is:")
	print(" a) Frontend: Add Tabs to Chat UI (Tabs: Chat, Vision).")
	print(" b) Vision Tab: If VNC is available, show an iframe to the VNC URL. If screenshots are streaming, show an image tag updated via WS.")
	print(" c) Backend: Define an API to get the vision stream info for a session/run.")

	return """
	1. Project Description
	- Vision: Add a seamlessly integrated 'Vision' tab in the Magentic-UI to observe agents' visual interactions (screenshots or VNC) in real-time.
	- Integration: The frontend `ChatView` will be updated to include tabs (Chat / Vision). The Vision tab will subscribe to visual data via WebSocket or display a VNC iframe.
	- FastAPI Setup:
	- Use existing `/api/ws/runs/{run_id}` for streaming screenshot events, or add `/api/runs/{run_id}/vision` to get VNC connection details.

	2. Tasks and Tests
	- Task 1 (Backend): Expose VNC/Vision info endpoint.
	- Modify `src/magentic_ui/backend/web/routes/runs.py` to add a `GET /runs/{run_id}/vision` endpoint returning VNC URL or stream status.
	- Test: Add a unit test in `tests/` checking if the endpoint returns valid connection info.
	- Task 2 (Backend WS): Broadcast screenshots.
	- Update WebSocketManager in `src/magentic_ui/backend/web/managers/websocket.py` to relay `screenshot` type messages from agents like `FaraWebSurfer`.
	- Test: Unit test the WebSocketManager to ensure `screenshot` messages are broadcasted properly.
	- Task 3 (Frontend): Implement Tabs in Chat UI.
	- Update `frontend/src/components/views/chat/chat.tsx` to wrap the chat interface in an Ant Design `<Tabs>` component (Chat vs Vision).
	- Test: Write a Playwright test ensuring the Tabs render and clicking 'Vision' switches the view.
	- Task 4 (Frontend): Implement Vision Component.
	- Create `frontend/src/components/views/chat/vision.tsx` to render an `<iframe>` for VNC or an `<img>` that updates when a `screenshot` WS message arrives.
	- Test: Write a Playwright test simulating a `screenshot` WS message and verifying the image source updates.

	3. Functionality Expectations
	- User perspective: User clicks 'Vision' tab and sees exactly what the agent sees (browser viewport, desktop) via VNC or updating screenshots. Switching tabs doesn't interrupt the run.
	- Technical perspective: Agents emit visual state. Backend routes it to the frontend via WS or provides a VNC endpoint. Frontend maintains connection regardless of active tab.
	- Constraints: VNC requires Docker configuration exposing the noVNC port. Fallback to screenshots if VNC is unavailable.

	4. API Endpoints to be Exposed
	- `GET /api/runs/{run_id}/vision`
	- Request: None
	- Response: `{ "status": true, "vnc_url": "ws://localhost:5900", "has_vnc": true }`
	- Auth: Inherits existing run access auth.
	- WebSocket `/api/ws/runs/{run_id}` (Existing, modified)
	- New message type from server: `{ "type": "screenshot", "data": "base64_encoded_image_string" }`

	READY
	"""

	print(generate_plan())