Spaces:
Build error
Build error
| title: "Enhanced Computer-Using Agent with VNC" | |
| emoji: "๐ฅ๏ธ" | |
| colorFrom: "green" | |
| colorTo: "blue" | |
| sdk: "docker" | |
| sdk_version: "3.12.0" | |
| app_file: "computer_agent_vnc.py" | |
| pinned: false | |
| # ๐ฅ๏ธ Enhanced Computer-Using Agent with VNC | |
| ๐ค **AI-powered browser automation with full desktop environment access** | |
| This enhanced Hugging Face Space provides a comprehensive computer-using agent that combines browser automation with a full VNC-accessible desktop environment, similar to OpenAI's Operator but with enhanced GUI capabilities. | |
| ## โจ New Features | |
| ### ๐ Enhanced Browser Automation | |
| - **Web Navigation**: Navigate to any URL with intelligent loading detection | |
| - **Screenshot Capture**: Take high-quality screenshots of web pages | |
| - **Element Interaction**: Click on elements, type text, and interact with forms | |
| - **Page Analysis**: Extract content, links, forms, and page structure | |
| ### ๐ฅ๏ธ VNC Desktop Environment | |
| - **Full GUI Access**: Complete XFCE4 desktop environment accessible via web | |
| - **VNC Integration**: Direct VNC access through browser interface | |
| - **Desktop Applications**: Run any Linux GUI applications | |
| - **Web-based VNC**: Access desktop through noVNC web client | |
| ### ๐ง Advanced Controls | |
| - **Dual Interface**: Browser automation + full desktop environment | |
| - **CSS Selector Support**: Target specific elements using CSS selectors | |
| - **Scrolling**: Navigate up and down pages with customizable scroll amounts | |
| - **Content Extraction**: Get page text, HTML, and structural information | |
| - **Action History**: Track all actions performed by the agent | |
| ## ๐ Usage | |
| ### Browser Automation Tab | |
| 1. Click "Initialize Browser" to start the browser automation | |
| 2. Enter a URL and click "Navigate" to visit the page | |
| 3. Use "Take Screenshot" to capture the current page | |
| 4. Monitor status and action history | |
| ### VNC Desktop Tab | |
| 1. Click "Check VNC Status" to verify desktop environment | |
| 2. Click "Open VNC Viewer" to access full desktop in new tab | |
| 3. Use the desktop environment for any GUI applications | |
| 4. **VNC Access Details:** | |
| - **Port**: 5901 | |
| - **Password**: computer-agent | |
| - **Web Interface**: Available through the VNC tab | |
| ### System Info Tab | |
| 1. Get detailed system information | |
| 2. Monitor agent status and capabilities | |
| 3. View feature availability | |
| ## ๐๏ธ Architecture | |
| ### Desktop Environment | |
| - **GUI Framework**: XFCE4 (lightweight desktop environment) | |
| - **VNC Server**: TigerVNC standalone server | |
| - **Web Bridge**: noVNC + websockify for web access | |
| - **Display Resolution**: 1920x1080 (configurable) | |
| ### Browser Integration | |
| - **Automation Engine**: Playwright with Chromium | |
| - **Screenshot Capability**: Real-time page capture | |
| - **Element Interaction**: Advanced DOM manipulation | |
| - **Headless/Headed**: Configurable browser mode | |
| ### Web Interface | |
| - **Framework**: Gradio 4.21.0 with enhanced features | |
| - **Tabs**: Browser automation, VNC desktop, system info | |
| - **Real-time Updates**: Live status monitoring | |
| - **Action History**: Complete interaction logging | |
| ## ๐ ๏ธ Technical Specifications | |
| ### Hardware Requirements | |
| - **CPU**: 2 vCPU (included in CPU basic tier) | |
| - **RAM**: 16 GB (adequate for desktop environment) | |
| - **Storage**: Standard Hugging Face Space allocation | |
| ### Software Stack | |
| - **Base**: Ubuntu 22.04 LTS | |
| - **Python**: 3.10 with optimized dependencies | |
| - **Desktop**: XFCE4 + X11 | |
| - **VNC**: TigerVNC + noVNC | |
| - **Browser**: Chromium with Playwright automation | |
| ### Network Configuration | |
| - **Web Interface**: Port 7860 (Gradio) | |
| - **VNC Server**: Port 5901 (TigerVNC) | |
| - **Web Bridge**: Port 5901 (websockify) | |
| ## ๐ Security Considerations | |
| ### VNC Security | |
| - **Password Protection**: VNC server requires authentication | |
| - **Local Access**: VNC accessible only within the Space environment | |
| - **No External Access**: Desktop environment isolated to container | |
| ### Browser Security | |
| - **Headless Mode**: Browser runs without visible interface | |
| - **Security Disabled**: For automation compatibility (same-origin policy relaxed) | |
| - **Sandboxed**: Browser runs in containerized environment | |
| ## ๐ฏ Use Cases | |
| ### Enhanced Automation | |
| - **GUI Testing**: Test applications requiring desktop environment | |
| - **Visual Regression**: Compare screenshots with desktop applications | |
| - **Multi-app Workflows**: Coordinate between browser and desktop apps | |
| - **Development**: Develop and test GUI applications | |
| ### Research & Development | |
| - **AI Research**: Run AI models with GUI interfaces | |
| - **Data Analysis**: Use desktop tools for data visualization | |
| - **Prototyping**: Rapid GUI application development | |
| - **Education**: Interactive learning environments | |
| ## ๐ฎ Advanced Features | |
| ### VNC Integration Benefits | |
| - **Full Desktop**: Access to complete Linux desktop environment | |
| - **GUI Applications**: Run any X11-based applications | |
| - **File Management**: Native file explorer and management tools | |
| - **Development Tools**: IDEs, debuggers, and development utilities | |
| ### Browser Automation Enhanced | |
| - **Visual Testing**: Compare automated browser actions with desktop | |
| - **Complex Workflows**: Combine browser automation with desktop apps | |
| - **Screenshots**: Capture both browser and desktop content | |
| - **Monitoring**: Real-time view of all automated activities | |
| ## ๐ System Requirements | |
| ### For Users | |
| - **Web Browser**: Any modern browser with JavaScript enabled | |
| - **Network**: Stable internet connection for Space access | |
| - **VNC Viewer**: Built-in web VNC client (no installation required) | |
| ### For Development | |
| - **Docker**: For local testing and development | |
| - **Linux**: Ubuntu 22.04 or compatible distribution | |
| - **Python 3.10+**: For running enhanced agent locally | |
| ## ๐จ Important Notes | |
| ### Performance Considerations | |
| - **Resource Usage**: Desktop environment uses additional memory | |
| - **Startup Time**: VNC server adds ~10-15 seconds to startup | |
| - **Network**: VNC traffic uses bandwidth for remote desktop access | |
| ### Best Practices | |
| - **Close VNC**: Close VNC viewer when not in use to save resources | |
| - **Monitor Usage**: Check Space logs for resource consumption | |
| - **Test Locally**: Develop and test locally before deploying | |
| ## ๐ง Troubleshooting | |
| ### VNC Issues | |
| - **Connection Failed**: Check VNC status in the interface | |
| - **Black Screen**: Wait 30 seconds for desktop to fully initialize | |
| - **Slow Performance**: Normal for remote desktop over web | |
| ### Browser Automation | |
| - **Elements Not Found**: Ensure page has fully loaded | |
| - **Screenshots Fail**: Check browser initialization status | |
| - **Navigation Timeout**: Verify URL accessibility | |
| --- | |
| **Experience the future of web automation with full desktop capabilities! ๐โจ** | |
| Built with โค๏ธ using Hugging Face Spaces, Gradio, Playwright, TigerVNC, and XFCE4 |