Spaces:
Build error
Build error
metadata
title: Enhanced Computer-Using Agent with VNC
emoji: ๐ฅ๏ธ
colorFrom: green
colorTo: blue
sdk: docker
sdk_version: 3.12.0
app_file: computer_agent_vnc.py
pinned: false
๐ฅ๏ธ Enhanced Computer-Using Agent with VNC
๐ค AI-powered browser automation with full desktop environment access
This enhanced Hugging Face Space provides a comprehensive computer-using agent that combines browser automation with a full VNC-accessible desktop environment, similar to OpenAI's Operator but with enhanced GUI capabilities.
โจ New Features
๐ Enhanced Browser Automation
- Web Navigation: Navigate to any URL with intelligent loading detection
- Screenshot Capture: Take high-quality screenshots of web pages
- Element Interaction: Click on elements, type text, and interact with forms
- Page Analysis: Extract content, links, forms, and page structure
๐ฅ๏ธ VNC Desktop Environment
- Full GUI Access: Complete XFCE4 desktop environment accessible via web
- VNC Integration: Direct VNC access through browser interface
- Desktop Applications: Run any Linux GUI applications
- Web-based VNC: Access desktop through noVNC web client
๐ง Advanced Controls
- Dual Interface: Browser automation + full desktop environment
- CSS Selector Support: Target specific elements using CSS selectors
- Scrolling: Navigate up and down pages with customizable scroll amounts
- Content Extraction: Get page text, HTML, and structural information
- Action History: Track all actions performed by the agent
๐ Usage
Browser Automation Tab
- Click "Initialize Browser" to start the browser automation
- Enter a URL and click "Navigate" to visit the page
- Use "Take Screenshot" to capture the current page
- Monitor status and action history
VNC Desktop Tab
- Click "Check VNC Status" to verify desktop environment
- Click "Open VNC Viewer" to access full desktop in new tab
- Use the desktop environment for any GUI applications
- VNC Access Details:
- Port: 5901
- Password: computer-agent
- Web Interface: Available through the VNC tab
System Info Tab
- Get detailed system information
- Monitor agent status and capabilities
- View feature availability
๐๏ธ Architecture
Desktop Environment
- GUI Framework: XFCE4 (lightweight desktop environment)
- VNC Server: TigerVNC standalone server
- Web Bridge: noVNC + websockify for web access
- Display Resolution: 1920x1080 (configurable)
Browser Integration
- Automation Engine: Playwright with Chromium
- Screenshot Capability: Real-time page capture
- Element Interaction: Advanced DOM manipulation
- Headless/Headed: Configurable browser mode
Web Interface
- Framework: Gradio 4.21.0 with enhanced features
- Tabs: Browser automation, VNC desktop, system info
- Real-time Updates: Live status monitoring
- Action History: Complete interaction logging
๐ ๏ธ Technical Specifications
Hardware Requirements
- CPU: 2 vCPU (included in CPU basic tier)
- RAM: 16 GB (adequate for desktop environment)
- Storage: Standard Hugging Face Space allocation
Software Stack
- Base: Ubuntu 22.04 LTS
- Python: 3.10 with optimized dependencies
- Desktop: XFCE4 + X11
- VNC: TigerVNC + noVNC
- Browser: Chromium with Playwright automation
Network Configuration
- Web Interface: Port 7860 (Gradio)
- VNC Server: Port 5901 (TigerVNC)
- Web Bridge: Port 5901 (websockify)
๐ Security Considerations
VNC Security
- Password Protection: VNC server requires authentication
- Local Access: VNC accessible only within the Space environment
- No External Access: Desktop environment isolated to container
Browser Security
- Headless Mode: Browser runs without visible interface
- Security Disabled: For automation compatibility (same-origin policy relaxed)
- Sandboxed: Browser runs in containerized environment
๐ฏ Use Cases
Enhanced Automation
- GUI Testing: Test applications requiring desktop environment
- Visual Regression: Compare screenshots with desktop applications
- Multi-app Workflows: Coordinate between browser and desktop apps
- Development: Develop and test GUI applications
Research & Development
- AI Research: Run AI models with GUI interfaces
- Data Analysis: Use desktop tools for data visualization
- Prototyping: Rapid GUI application development
- Education: Interactive learning environments
๐ฎ Advanced Features
VNC Integration Benefits
- Full Desktop: Access to complete Linux desktop environment
- GUI Applications: Run any X11-based applications
- File Management: Native file explorer and management tools
- Development Tools: IDEs, debuggers, and development utilities
Browser Automation Enhanced
- Visual Testing: Compare automated browser actions with desktop
- Complex Workflows: Combine browser automation with desktop apps
- Screenshots: Capture both browser and desktop content
- Monitoring: Real-time view of all automated activities
๐ System Requirements
For Users
- Web Browser: Any modern browser with JavaScript enabled
- Network: Stable internet connection for Space access
- VNC Viewer: Built-in web VNC client (no installation required)
For Development
- Docker: For local testing and development
- Linux: Ubuntu 22.04 or compatible distribution
- Python 3.10+: For running enhanced agent locally
๐จ Important Notes
Performance Considerations
- Resource Usage: Desktop environment uses additional memory
- Startup Time: VNC server adds ~10-15 seconds to startup
- Network: VNC traffic uses bandwidth for remote desktop access
Best Practices
- Close VNC: Close VNC viewer when not in use to save resources
- Monitor Usage: Check Space logs for resource consumption
- Test Locally: Develop and test locally before deploying
๐ง Troubleshooting
VNC Issues
- Connection Failed: Check VNC status in the interface
- Black Screen: Wait 30 seconds for desktop to fully initialize
- Slow Performance: Normal for remote desktop over web
Browser Automation
- Elements Not Found: Ensure page has fully loaded
- Screenshots Fail: Check browser initialization status
- Navigation Timeout: Verify URL accessibility
Experience the future of web automation with full desktop capabilities! ๐โจ
Built with โค๏ธ using Hugging Face Spaces, Gradio, Playwright, TigerVNC, and XFCE4