Maxun / plan3.py
AUXteam's picture
Upload folder using huggingface_hub
137ee57 verified
import sys
def get_plan():
return """
1. **Project Description**
- Vision: Add a seamlessly integrated 'Vision' tab in the Magentic-UI to observe agents' visual interactions (screenshots or VNC) in real-time. Make the application compatible with Hugging Face Spaces deployment using Docker SDK.
- Integration: The frontend `ChatView` will be updated to include tabs (Chat / Vision). The Vision tab will subscribe to visual data via WebSocket or display a VNC iframe.
- FastAPI Setup:
- Use existing `/api/ws/runs/{run_id}` for streaming screenshot events, or add `/api/runs/{run_id}/vision` to get VNC connection details. Ensure `/health` and `/api-docs` are functional for Hugging Face Spaces.
2. **Tasks and Tests**
- Task 1 (Backend): Expose VNC/Vision info endpoint.
- Modify `src/magentic_ui/backend/web/routes/runs.py` using `replace_with_git_merge_diff` to add a `GET /runs/{run_id}/vision` endpoint returning VNC URL or stream status.
- Test: Add a unit test in `tests/` checking if the endpoint returns valid connection info.
- Task 2 (Backend WS): Broadcast screenshots.
- Update WebSocketManager in `src/magentic_ui/backend/web/managers/connection.py` using `replace_with_git_merge_diff` to relay `screenshot` type messages from agents like `FaraWebSurfer`.
- Test: Unit test the WebSocketManager to ensure `screenshot` messages are broadcasted properly.
- Task 3 (Frontend): Implement Tabs in Chat UI.
- Update `frontend/src/components/views/chat/chat.tsx` using `replace_with_git_merge_diff` to wrap the chat interface in an Ant Design `<Tabs>` component (Chat vs Vision).
- Test: Write a Playwright test ensuring the Tabs render and clicking 'Vision' switches the view.
- Task 4 (Frontend): Implement Vision Component.
- Create `frontend/src/components/views/chat/vision.tsx` using `write_file` to render an `<iframe>` for VNC or an `<img>` that updates when a `screenshot` WS message arrives. Verify the file contents using `read_file`.
- Test: Write a Playwright test simulating a `screenshot` WS message and verifying the image source updates.
- Task 5 (HF Deploy Setup): Prepare Hugging Face Deployment Configuration.
- Create/Update `Dockerfile` to expose port 7860 and run both backend and frontend correctly.
- Create/Update `README.md` to include Hugging Face YAML frontmatter (app_port: 7860, sdk: docker).
- Ensure `/health` and `/api-docs` endpoints are properly exposed in the FastAPI backend (`app.py`).
- Create `Agent.md` capturing the deployment configuration, API documentation, and test cases as requested.
- Task 6 (Test Verification): Run all tests.
- Run unit tests and frontend playwright tests to ensure there are no regressions using `run_in_bash_session` to execute `pytest` and `npm test` or equivalent.
- Task 7 (Pre Commit): Complete pre commit steps.
- Complete pre-commit steps to ensure proper testing, verification, review, and reflection are done.
- Task 8 (Submission): Submit code.
- Once all tests pass, submit the change.
3. **Functionality Expectations**
- User perspective: User clicks 'Vision' tab and sees exactly what the agent sees (browser viewport, desktop) via VNC or updating screenshots. Switching tabs doesn't interrupt the run. The app is accessible on Hugging Face Spaces.
- Technical perspective: Agents emit visual state. Backend routes it to the frontend via WS or provides a VNC endpoint. Frontend maintains connection regardless of active tab. Docker container serves the app on port 7860.
- Constraints: VNC requires Docker configuration exposing the noVNC port. Fallback to screenshots if VNC is unavailable. HF Space deployment constraints apply.
4. **API Endpoints to be Exposed**
- `GET /health` (Required by HF)
- `GET /api-docs` (Required by HF)
- `GET /api/runs/{run_id}/vision`
- Request: None
- Response: `{ "status": true, "vnc_url": "ws://localhost:5900", "has_vnc": true }`
- Auth: Inherits existing run access auth.
- WebSocket `/api/ws/runs/{run_id}` (Existing, modified)
- New message type from server: `{ "type": "screenshot", "data": "base64_encoded_image_string" }`
"""
print(get_plan())