NullVoider commited on
Commit
50a8a77
Β·
verified Β·
1 Parent(s): fd6f2e8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2051 -3
README.md CHANGED
@@ -1,3 +1,2051 @@
1
- ---
2
- license: gpl-3.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gpl-3.0
3
+ tags:
4
+ - Machine learning & AI
5
+ - Operating systems
6
+ ---
7
+
8
+ # Windows 11
9
+
10
+ **AI Agent Training & Evaluation Environment**
11
+ **Version:** 1
12
+ **Base System:** Windows 11 Pro
13
+ **Architecture:** x86_64
14
+ **Last Updated:** May 2026
15
+ **Developer:** Kartik (NullVoider)
16
+
17
+ ---
18
+
19
+ ## Table of Contents
20
+
21
+ 1. [Overview](#overview)
22
+ 2. [Key Features](#key-features)
23
+ 3. [Container Capabilities](#container-capabilities)
24
+ - [Operating System](#operating-system)
25
+ - [Development Tools](#development-tools)
26
+ - [Remote Access](#remote-access)
27
+ 4. [Technical Specifications](#technical-specifications)
28
+ - [System Requirements](#system-requirements)
29
+ - [Container Resource Usage](#container-resource-usage)
30
+ - [Performance Metrics](#performance-metrics)
31
+ 5. [Installation & Deployment](#installation--deployment)
32
+ - [Prerequisites](#prerequisites)
33
+ - [Docker Compose Deployment](#docker-compose-deployment)
34
+ - [Testing the Container](#testing-the-container)
35
+ 6. [Installed Software](#installed-software)
36
+ 7. [Development Environments](#development-environments)
37
+ 8. [The-Eye Integration](#the-eye-integration)
38
+ 9. [Task Executor API](#task-executor-api)
39
+ 10. [Remote Access Methods](#remote-access-methods)
40
+ - [RDP (Recommended)](#rdp-recommended)
41
+ - [SSH Access](#ssh-access)
42
+ 11. [Troubleshooting](#troubleshooting)
43
+ 12. [CI/CD Integration](#cicd-integration)
44
+ 13. [Reporting Issues](#reporting-issues)
45
+ 14. [FAQ](#faq)
46
+ 15. [License](#license)
47
+ 16. [About This Project](#about-this-project)
48
+
49
+ ---
50
+
51
+ ## Overview
52
+
53
+ The **Windows 11 Container** is a complete Windows development environment designed for AI agent training, testing, evaluation, and deployment. It provides a full Windows desktop experience with pre-configured development tools, an integrated Task Executor REST API for coding agent evaluation, and screen capture monitoringβ€”all within a single self-contained Docker container.
54
+
55
+ ### Purpose
56
+
57
+ This container is designed for:
58
+
59
+ - **Computer Use Agent Development**: Pre-configured environment for building and testing CUA applications
60
+ - **Coding Agent Evaluation**: Integrated Task Executor REST API (port 9090) for programmatic task submission, multi-framework test scoring, lint analysis, diff capture, and ground-truth patch similarity scoring.
61
+ - **Windows Development**: Native Windows environment for developing Windows-specific applications
62
+ - **Automated Testing**: Consistent, reproducible Windows environment for CI/CD pipelines
63
+ - **Remote Development**: Full-featured Windows desktop accessible via RDP and VNC
64
+ - **Multi-Language Development**: Support for 10+ programming languages out of the box
65
+ - **Visual Monitoring**: Integrated Eye tool for screen capture and agent training data collection
66
+
67
+ ### What Makes This Unique
68
+
69
+ - **Single Container Design**: Complete Windows 11 system with no external file dependencies
70
+ - **Ephemeral State**: Everything is isolated inside the container, providing clean state management
71
+ - **Virtual Disk**: 2TB of massive storage capacity.
72
+ - **RAM**: Customizable memory allocation for smooth performance (minimum 4 GB for smooth experience).
73
+ - **Optimized Performance**: Significantly smoother than existing Windows container alternatives
74
+ - **Fully Customizable**: Configuration can be modified to improve performance based on hardware
75
+ - **Zero External Files**: Everything is self-contained
76
+ - **Developer-Ready**: Pre-installed IDEs, tools, and language runtimes
77
+ - **Task Executor API**: REST API for programmatic coding agent evaluation (port 9090)
78
+ - **Multi-Framework Scoring**: pytest, cargo, go test, jest, dotnet, JUnit β€” auto-detected and scored
79
+
80
+ ---
81
+
82
+ ## Key Features
83
+
84
+ ### Operating System
85
+ βœ… **Windows 11 Pro** - Latest releases
86
+ βœ… **Virtual Disk** - 2TB of massive storage capacity.
87
+ βœ… **RAM** - Customizable memory allocation for smooth performance (minimum 4 GB for smooth experience).
88
+ βœ… **Ephemeral State** - Clean isolation with no external dependencies
89
+
90
+ **Note**: The virtual storage does not mandate requirement of exactly 2TB of storage in the device running the container. The virtual disk is a growable disk, and 2TB is the cap on the virtual disk.
91
+
92
+ ### Development Tools
93
+ βœ… **10+ Languages** - Python, Go, Rust, Java, C#, C++, Node.js, TypeScript, Kotlin, Scala
94
+ βœ… **VS Code** - Pre-installed with essential extensions
95
+ βœ… **Visual Studio Build Tools** - Windows development tools
96
+ βœ… **Git & Git LFS** - Version control with large file support
97
+ βœ… **PowerShell & Terminal** - Modern shell utilities
98
+
99
+ ### Applications
100
+ βœ… **Edge Browser** - Default web browser
101
+ βœ… **VS Code** - Feature-rich code editor
102
+ βœ… **Windows Terminal** - Modern terminal experience
103
+
104
+ ### Remote Access
105
+ βœ… **RDP** - Native Windows Remote Desktop (3389/TCP) - **Recommended**
106
+ βœ… **SSH** - Secure shell access (2222/TCP)
107
+ βœ… **Eye Server** - Screen capture endpoint (8080/HTTP)
108
+ βœ… **Task Executor API** - Coding agent eval REST API (9090/HTTP)
109
+
110
+ ### Coding Agent Evaluation
111
+ βœ… **Task Executor REST API** - Submit tasks, run tests, retrieve structured results
112
+ βœ… **Multi-Framework Test Scoring** - pytest, cargo test, go test, jest, dotnet test, JUnit/Maven/Gradle/sbt
113
+ βœ… **Lint Integration** - Soft-score linting via ruff, mypy, flake8, clippy, eslint, and more
114
+ βœ… **Diff Capture** - Records agent-produced diffs after each task run
115
+ βœ… **Reference Patch Scoring** - Ground-truth patch similarity (0.0–1.0) for patch-apply evals
116
+ βœ… **API Authentication** - Optional bearer token auth via `API_TOKEN` env variable
117
+
118
+ ### Performance & Stability
119
+ βœ… **Fast Boot Time** - Container ready in ~25 seconds
120
+ βœ… **Low CPU Usage** - 10-20% under normal workload
121
+ βœ… **Smooth Performance** - Optimized for regular development tasks
122
+ βœ… **Single Container** - No external files or dependencies
123
+ βœ… **KVM Acceleration** - Hardware virtualization for optimal performance
124
+
125
+ ---
126
+
127
+ ## Container Capabilities
128
+
129
+ ### Operating System
130
+
131
+ **Windows 11 Pro**
132
+ - Complete Windows desktop experience
133
+ - Native Windows applications support
134
+ - Standard NTFS file system
135
+ - Windows security features
136
+ - Native Windows APIs and frameworks
137
+
138
+ **Storage Configuration**:
139
+ - **Virtual Disk**: 2TB capacity
140
+ - **Format**: NTFS
141
+ - **RAM**: Customizable as needed (minimum 4 GB for smooth experience).
142
+ - **CPU**: Host CPU
143
+
144
+ **Pre-installed Applications**:
145
+ - **Browser**: Brave
146
+ - **Editor**: Visual Studio Code
147
+ - **Terminal**: Windows Terminal with PowerShell
148
+ - **File Manager**: Windows Explorer
149
+ - **System Utilities**: Standard Windows utilities
150
+
151
+ ### Development Tools
152
+
153
+ #### Programming Languages & Runtimes
154
+
155
+ | Language | Version | Package Manager | Notes |
156
+ |----------|---------|----------------|-------|
157
+ | **Python** | 3.14.4 | pip 26.0.1 | Default `python` command |
158
+ | **Go** | 1.26.1 | go modules | Full Go development environment |
159
+ | **Rust** | stable | cargo | System-wide installation |
160
+ | **Node.js** | 24.14.0 | npm 11.9.0 | TypeScript & tsx included |
161
+ | **Java** | 25 (latest) | - | Oracle JDK |
162
+ | **C#/.NET** | 10.0 SDK | dotnet | LTS version |
163
+ | **C/C++** | MSVC/clang | - | Visual Studio Build Tools |
164
+ | **Kotlin** | 2.3.0 | - | Compiler installed |
165
+ | **Scala** | 3.8.2 | coursier | Latest stable |
166
+ | **PowerShell** | latest | - | Pre-installed |
167
+
168
+ #### IDEs & Editors
169
+
170
+ **Visual Studio Code** (latest)
171
+
172
+ Pre-installed extensions:
173
+ - C++ Tools Extension Pack
174
+ - Docker Extension
175
+ - Java Extension Pack
176
+ - Oracle Java Extension
177
+ - .NET Runtime & C# DevKit
178
+ - GitLab Workflow & GitLens
179
+ - Go Extension
180
+ - Python Extension Pack (Pylance, debugpy, environment manager)
181
+ - Rust Analyzer
182
+ - Scala Language Server
183
+
184
+ #### Build Tools & Utilities
185
+
186
+ - **Git** (latest) - Version control with LFS support
187
+ - **Visual Studio Build Tools** - Essential development tools
188
+ - **CMake** - Cross-platform build system
189
+ - **Windows Debugger** - Debugging tools
190
+
191
+ ### Remote Access
192
+
193
+ #### RDP (Port 3389) - **Recommended**
194
+
195
+ **Why RDP?**
196
+ - **Best Performance**: Native Windows protocol with hardware acceleration
197
+ - **Low Latency**: Minimal input lag for smooth development experience
198
+ - **High Quality**: Superior video quality with efficient compression
199
+ - **Full Features**: Clipboard sharing, file transfer, audio support
200
+ - **Native Integration**: Built into Windows, no client installation needed (Windows hosts)
201
+
202
+ **Configuration**:
203
+ - Port: 3389 (TCP)
204
+ - Default remote access method
205
+ - Pre-configured for optimal performance
206
+ - Audio support enabled
207
+
208
+ **Use Cases**:
209
+ - Primary development interface
210
+ - Extended coding sessions
211
+ - Full desktop interaction
212
+ - Multi-window workflows
213
+
214
+ #### SSH (Port 2222)
215
+
216
+ **Configuration**:
217
+ - Port: 2222 (TCP)
218
+ - Secure shell access via OpenSSH
219
+ - Terminal-based access to Windows
220
+
221
+ **Use Cases**:
222
+ - Command-line operations
223
+ - File transfers via SCP/SFTP
224
+ - Remote script execution
225
+ - System administration
226
+
227
+ ---
228
+
229
+ ## Technical Specifications
230
+
231
+ ### System Requirements
232
+
233
+ #### Minimum Requirements
234
+
235
+ | Component | Requirement | Notes |
236
+ |-----------|-------------|-------|
237
+ | **RAM** | 4 GB | Absolute minimum for container operation |
238
+ | **Disk Space** | 100 GB free | For container image and virtual disk |
239
+ | **CPU** | 4 cores | x86_64 architecture with KVM support |
240
+ | **Virtualization** | KVM enabled | Hardware virtualization must be enabled in BIOS |
241
+ | **Host OS** | Linux | Ubuntu 20.04+, Debian 11+, or similar |
242
+ | **Docker** | 24.0+ | Recent Docker version required |
243
+ | **Kernel** | 5.10+ | For proper KVM support |
244
+
245
+ #### Recommended Requirements
246
+
247
+ | Component | Recommendation | Benefit |
248
+ |-----------|---------------|---------|
249
+ | **RAM** | 8 GB | Better performance and headroom |
250
+ | **Disk Space** | 256 GB free | Ample space for projects and data |
251
+ | **CPU** | 4+ cores | Improved responsiveness |
252
+ | **Storage Type** | SSD/NVMe | Faster disk I/O operations |
253
+ | **Network** | 100 Mbps+ | Better remote access experience |
254
+
255
+ ### Container Resource Usage
256
+
257
+ **Runtime Allocations**:
258
+ - **Virtual RAM**: 8 GB (allocated to Windows)
259
+ - **Virtual Disk**: 2 TB (NTFS filesystem)
260
+ - **Virtual CPU**: Host CPU
261
+ - **Network**: Bridged networking with port forwarding
262
+
263
+ **Host Resource Impact**:
264
+ - **CPU Usage**: 10-20% under normal workload
265
+ - **Memory Overhead**: ~2-3 GB for container management
266
+ - **Disk I/O**: Moderate (depends on workload)
267
+ - **Network**: Minimal overhead
268
+
269
+ ### Performance Metrics
270
+
271
+ **Boot Performance**:
272
+ - **Cold Boot**: 25 seconds
273
+ - **Container Start**: Same as cold boot
274
+ - **Desktop Ready**: Immediate after boot completion
275
+
276
+ **Runtime Performance**:
277
+ - **Idle CPU**: 5-10%
278
+ - **Normal Workload CPU**: 10-20%
279
+ - **Memory Usage**: Stable at allocated 4GB
280
+ - **Disk Performance**: Depends on host storage type
281
+
282
+ **Comparison to Alternatives**:
283
+ - **Better Performance**: 10-20% CPU usage vs higher overhead in alternatives
284
+ - **Smoother Operation**: Optimized for stability and responsiveness
285
+ - **External Files**: None required (vs. multiple external files in alternatives)
286
+ - **Customization**: Fully customizable configuration
287
+ - **State Management**: Clean ephemeral state
288
+
289
+ **Optimization Notes**:
290
+ - Current configuration is optimized for compatibility and stability
291
+ - Configuration is based on tested and confirmed safe settings
292
+ - Performance can be improved by adjusting CPU configuration to match host hardware
293
+ - Animations may cause slight performance impact
294
+ - Regular development workflows run smoothly without issues
295
+
296
+ ---
297
+
298
+ ## Installation & Deployment
299
+
300
+ ### Prerequisites
301
+
302
+ #### 1. Install Docker
303
+
304
+ **For Ubuntu/Debian**:
305
+ ```bash
306
+ # Update package index
307
+ sudo apt-get update
308
+
309
+ # Install dependencies
310
+ sudo apt-get install -y \
311
+ ca-certificates \
312
+ curl \
313
+ gnupg \
314
+ lsb-release
315
+
316
+ # Add Docker's official GPG key
317
+ sudo mkdir -p /etc/apt/keyrings
318
+ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
319
+
320
+ # Set up the repository
321
+ echo \
322
+ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
323
+ $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
324
+
325
+ # Install Docker Engine
326
+ sudo apt-get update
327
+ sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
328
+
329
+ # Verify installation
330
+ docker --version
331
+ docker compose version
332
+ ```
333
+
334
+ **For Other Linux Distributions**:
335
+ ```bash
336
+ # Fedora/RHEL/CentOS
337
+ sudo dnf install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
338
+
339
+ # Arch Linux
340
+ sudo pacman -S docker docker-compose
341
+ ```
342
+
343
+ **Post-Installation Steps**:
344
+ ```bash
345
+ # Add your user to docker group (to run docker without sudo)
346
+ sudo usermod -aG docker $USER
347
+
348
+ # Enable Docker service
349
+ sudo systemctl enable docker
350
+ sudo systemctl start docker
351
+
352
+ # Log out and log back in for group changes to take effect
353
+ ```
354
+
355
+ #### 2. Enable KVM
356
+
357
+ **Check KVM Support**:
358
+ ```bash
359
+ # Check if KVM is supported
360
+ lscpu | grep Virtualization
361
+
362
+ # Check if KVM modules are loaded
363
+ lsmod | grep kvm
364
+
365
+ # Expected output:
366
+ # kvm_intel (for Intel CPUs) or kvm_amd (for AMD CPUs)
367
+ # kvm
368
+ ```
369
+
370
+ **Enable KVM**:
371
+ ```bash
372
+ # Install KVM packages (Ubuntu/Debian)
373
+ sudo apt-get install -y qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils
374
+
375
+ # For Fedora/RHEL/CentOS
376
+ sudo dnf install -y qemu-kvm libvirt virt-install bridge-utils
377
+
378
+ # Verify KVM is working
379
+ sudo kvm-ok
380
+
381
+ # Expected output:
382
+ # INFO: /dev/kvm exists
383
+ # KVM acceleration can be used
384
+ ```
385
+
386
+ **Set KVM Permissions**:
387
+ ```bash
388
+ # Add user to kvm group
389
+ sudo usermod -aG kvm $USER
390
+
391
+ # Verify /dev/kvm permissions
392
+ ls -l /dev/kvm
393
+
394
+ # Should show: crw-rw---- 1 root kvm
395
+
396
+ # Log out and log back in for group changes to take effect
397
+ ```
398
+
399
+ **Verify KVM Access**:
400
+ ```bash
401
+ # After logging back in, verify you can access KVM
402
+ groups | grep kvm
403
+
404
+ # Test KVM device access
405
+ test -r /dev/kvm && test -w /dev/kvm && echo "KVM is accessible" || echo "KVM access denied"
406
+ ```
407
+
408
+ **If KVM is Not Enabled in BIOS**:
409
+ 1. Restart your computer
410
+ 2. Enter BIOS/UEFI settings (usually F2, F10, F12, or Del key during boot)
411
+ 3. Look for virtualization settings:
412
+ - Intel: "Intel VT-x" or "Intel Virtualization Technology"
413
+ - AMD: "AMD-V" or "SVM Mode"
414
+ 4. Enable the setting
415
+ 5. Save and exit BIOS
416
+ 6. Boot into Linux and verify with `kvm-ok`
417
+
418
+ ### Docker Compose Deployment
419
+
420
+ **Recommended Deployment Method**: The **ONLY** recommended way to run this container is using Docker Compose. This ensures proper configuration and port mappings.
421
+
422
+ #### 1. Create Docker Compose File
423
+
424
+ Create a file named `deploy-windows.yaml`:
425
+
426
+ ```yaml
427
+ services:
428
+ win-agent:
429
+ image: nullvoider/win11-base:v1
430
+ container_name: win_agent
431
+ restart: unless-stopped
432
+ tty: true
433
+ stdin_open: true
434
+ ports:
435
+ - 3389:3389 # RDP (recommended remote access)
436
+ - 4444:4445 # I/O
437
+ - 8080:8080 # Eye server
438
+ - 9090:9090 # Task Executor API
439
+ - 2222:2222 # SSH
440
+ environment:
441
+ - API_TOKEN=your-secret-token
442
+ - TASK_MAX_AGE=3600
443
+ devices:
444
+ - /dev/kvm:/dev/kvm
445
+ cap_add:
446
+ - NET_ADMIN
447
+ extra_hosts:
448
+ - "host.docker.internal:host-gateway"
449
+ ```
450
+
451
+ #### 2. Deploy the Container
452
+
453
+ ```bash
454
+ # Start the container
455
+ docker compose -f deploy-windows.yaml up -d
456
+
457
+ # View logs
458
+ docker compose -f deploy-windows.yaml logs -f
459
+
460
+ # Check container status
461
+ docker compose -f deploy-windows.yaml ps
462
+ ```
463
+
464
+ #### 3. Container Management
465
+
466
+ ```bash
467
+ # Stop the container
468
+ docker compose -f deploy-windows.yaml stop
469
+
470
+ # Start the container
471
+ docker compose -f deploy-windows.yaml start
472
+
473
+ # Restart the container
474
+ docker compose -f deploy-windows.yaml restart
475
+
476
+ # Remove the container
477
+ docker compose -f deploy-windows.yaml down
478
+
479
+ # Remove container and volumes
480
+ docker compose -f deploy-windows.yaml down -v
481
+ ```
482
+
483
+ ### Testing the Container
484
+
485
+ #### 1. Verify Container is Running
486
+
487
+ ```bash
488
+ # Check container status
489
+ docker ps | grep win_agent
490
+
491
+ # Expected output:
492
+ # CONTAINER ID IMAGE STATUS PORTS
493
+ # abc123def456 nullvoider/win11-base:v1 Up 2 minutes 0.0.0.0:3389->3389/tcp, ...
494
+ ```
495
+
496
+ #### 2. Check Boot Progress
497
+
498
+ ```bash
499
+ # Monitor container logs
500
+ docker logs -f win_agent
501
+
502
+ # Look for successful boot messages indicating:
503
+ # - Windows boot sequence completed
504
+ # - Services started
505
+ # - RDP server ready
506
+ ```
507
+
508
+ #### 3. Test Remote Access
509
+
510
+ **RDP (Recommended)**:
511
+ ```bash
512
+ # From Windows host:
513
+ # Press Win+R, type: mstsc
514
+ # Connect to: your-server-ip:3389
515
+
516
+ # From Linux host:
517
+ # Use Remmina, xfreerdp, or rdesktop
518
+ xfreerdp /v:your-server-ip:3389 /u:AgentUser
519
+ ```
520
+
521
+ **SSH**:
522
+ ```bash
523
+ # Test SSH connection
524
+ ssh -p 2222 AgentUser@your-server-ip
525
+ ```
526
+
527
+ #### 4. Verify Services
528
+
529
+ Once connected via RDP:
530
+ 1. Open PowerShell or Command Prompt
531
+ 2. Check system information: `systeminfo`
532
+ 3. Verify development tools: `python --version`, `node --version`, etc.
533
+ 4. Open VS Code to verify it's installed
534
+
535
+ #### 5. Health Check
536
+
537
+ ```bash
538
+ # Check container resource usage
539
+ docker stats win_agent
540
+
541
+ # Expected metrics:
542
+ # CPU: 10-20% (normal workload)
543
+ # MEM: ~4GB allocated
544
+ # NET I/O: Varies based on remote access usage
545
+ ```
546
+
547
+ ---
548
+
549
+ ## Installed Software
550
+
551
+ ### Pre-installed Applications
552
+
553
+ #### Productivity & Development
554
+ - **Brave** - Default web browser
555
+ - **Visual Studio Code** - Feature-rich code editor with extensions
556
+ - **Windows Terminal** - Modern terminal with PowerShell
557
+
558
+ #### System Utilities
559
+ - **Windows Explorer** - File manager
560
+ - **Settings** - Windows settings
561
+ - **Task Manager** - System resource monitoring
562
+ - **Event Viewer** - System log viewer
563
+
564
+ ### Command Line Tools
565
+
566
+ #### Package Managers
567
+ - **pip** - Python package manager
568
+ - **npm** - Node.js package manager
569
+ - **cargo** - Rust package manager
570
+ - **go modules** - Go dependency management
571
+
572
+ #### Development Utilities
573
+ - **git** - Version control (with Git LFS)
574
+ - **PowerShell** - Default shell
575
+ - **Windows Terminal** - Modern terminal experience
576
+
577
+ #### Build Tools
578
+ - **Visual Studio Build Tools** - Essential development tools
579
+ - **MSVC** - Microsoft C/C++ compiler
580
+ - **make** / **nmake** - Build automation
581
+ - **cmake** - Cross-platform build system
582
+
583
+ ---
584
+
585
+ ## Development Environments
586
+
587
+ ### Python Development
588
+ ```powershell
589
+ # Python 3.14.4 pre-installed
590
+ python --version
591
+
592
+ # Install packages
593
+ pip install numpy pandas tensorflow
594
+
595
+ # Virtual environments
596
+ python -m venv myenv
597
+ myenv\Scripts\activate
598
+ ```
599
+
600
+ ### Node.js Development
601
+ ```powershell
602
+ # Node.js 24.14.0 pre-installed
603
+ node --version
604
+ npm --version
605
+
606
+ # Install packages
607
+ npm install -g typescript tsx
608
+
609
+ # Project setup
610
+ npm init -y
611
+ npm install express
612
+ ```
613
+
614
+ ### Go Development
615
+ ```powershell
616
+ # Go 1.26.1 pre-installed
617
+ go version
618
+
619
+ # Initialize module
620
+ go mod init myproject
621
+
622
+ # Install dependencies
623
+ go get github.com/gin-gonic/gin
624
+ ```
625
+
626
+ ### Rust Development
627
+ ```powershell
628
+ # Rust stable pre-installed
629
+ rustc --version
630
+ cargo --version
631
+
632
+ # Create new project
633
+ cargo new myproject
634
+ cd myproject
635
+ cargo build
636
+ ```
637
+
638
+ ### Java Development
639
+ ```powershell
640
+ # Java 25 pre-installed
641
+ java --version
642
+ javac --version
643
+
644
+ # Compile and run
645
+ javac HelloWorld.java
646
+ java HelloWorld
647
+ ```
648
+
649
+ ### C#/.NET Development
650
+ ```powershell
651
+ # .NET 10.0 SDK pre-installed
652
+ dotnet --version
653
+
654
+ # Create new project
655
+ dotnet new console -n MyApp
656
+ cd MyApp
657
+ dotnet run
658
+ ```
659
+
660
+ ### Windows Development
661
+ ```powershell
662
+ # Visual Studio Build Tools available
663
+ msbuild -version
664
+
665
+ # Build tools
666
+ cl.exe # MSVC compiler
667
+ link.exe # Linker
668
+ ```
669
+
670
+ ---
671
+
672
+ ## The-Eye Integration
673
+
674
+ The Eye is an AI-native vision capture tool integrated into the Windows container, providing automated screen capture capabilities for Computer Use Agent training, monitoring, and debugging.
675
+
676
+ ### Overview
677
+
678
+ The Eye captures screen content at configurable intervals for:
679
+ - **Agent Training**: Collect visual data for training CUAs
680
+ - **Debugging**: Record agent interactions for troubleshooting
681
+ - **Monitoring**: Track agent behavior during execution
682
+ - **Dataset Creation**: Build machine learning datasets from screen captures
683
+
684
+ ### Configuration
685
+
686
+ **Eye Server Port**: 8080 (HTTP)
687
+ **Architecture**: Client-server model with RESTful API
688
+ **Storage**: In-memory circular buffer (configurable capacity)
689
+
690
+ ### Connection & Endpoints
691
+
692
+ **Eye Server Base URL**:
693
+ ```
694
+ http://your-server-ip:8080
695
+ ```
696
+
697
+ **Available Endpoints**:
698
+ - `GET /health` - Server health status and metrics
699
+ - `GET /snapshot.png` - Retrieve latest captured frame
700
+ - `POST /upload` - Upload captured frames (for external agents)
701
+ - `POST /admin/config` - Update capture configuration
702
+ - `GET /debug` - Server runtime statistics
703
+
704
+ ### Python SDK
705
+
706
+ The Eye includes a Python SDK for programmatic access:
707
+
708
+ **Installation** (if not using container's built-in Eye):
709
+ ```bash
710
+ pip install eye-capture
711
+ ```
712
+
713
+ **Basic Usage**:
714
+ ```python
715
+ from eye.core import EyeClient
716
+
717
+ # Connect to Eye server
718
+ client = EyeClient("http://localhost:8080", token="your-token")
719
+
720
+ # Health check
721
+ health = client.health_check()
722
+
723
+ # Get latest screenshot
724
+ image_data = client.get_snapshot()
725
+ with open("screenshot.png", "wb") as f:
726
+ f.write(image_data)
727
+
728
+ # Get frame metadata
729
+ metadata = client.get_snapshot_metadata()
730
+ print(f"Frame ID: {metadata['frame_id']}")
731
+
732
+ # Get debug info
733
+ debug = client.get_debug_info()
734
+ print(f"Uptime: {debug['uptime_sec']}s")
735
+ ```
736
+
737
+ **Advanced Features**:
738
+ ```python
739
+ from eye.core import EyeClient, SessionManager
740
+ from eye.integrations import DatasetExporter
741
+
742
+ # Initialize components
743
+ client = EyeClient("http://localhost:8080", token="TOKEN")
744
+ exporter = DatasetExporter()
745
+
746
+ # Capture session
747
+ for i in range(100):
748
+ frame = client.get_snapshot()
749
+ metadata = client.get_snapshot_metadata()
750
+ exporter.add_frame(frame, i, metadata)
751
+ time.sleep(1.5)
752
+
753
+ # Export dataset
754
+ exporter.export_json("training_data.json")
755
+ exporter.export_csv("training_data.csv")
756
+ ```
757
+
758
+ ### Key Features
759
+
760
+ **Capture Capabilities**:
761
+ - Multiple image formats (PNG, JPEG, WebP, BMP, TIFF)
762
+ - Configurable quality (1-100)
763
+ - Adjustable capture interval (0.1s minimum)
764
+ - Automatic retries with exponential backoff
765
+
766
+ **API Features**:
767
+ - RESTful HTTP endpoints
768
+ - Token authentication
769
+ - Dynamic configuration updates
770
+ - Health monitoring
771
+ - Debug statistics
772
+
773
+ **Integration Options**:
774
+ - Python SDK for programmatic access
775
+ - REST API for any language
776
+ - Dataset export (JSON, JSONL, CSV)
777
+ - Webhook support for event notifications
778
+ - Cloud storage integration patterns
779
+
780
+ ### Quick Usage Examples
781
+
782
+ **REST API (PowerShell)**:
783
+ ```powershell
784
+ # Get latest screenshot
785
+ Invoke-WebRequest -Uri http://localhost:8080/snapshot.png -OutFile screenshot.png
786
+
787
+ # Check health
788
+ Invoke-RestMethod -Uri http://localhost:8080/health
789
+
790
+ # Update configuration
791
+ $body = @{
792
+ interval = 2.0
793
+ format = "jpeg"
794
+ quality = 85
795
+ } | ConvertTo-Json
796
+
797
+ Invoke-RestMethod -Uri http://localhost:8080/admin/config `
798
+ -Method Post `
799
+ -Headers @{"Authorization"="Bearer your-token"} `
800
+ -Body $body `
801
+ -ContentType "application/json"
802
+ ```
803
+
804
+ **Python SDK**:
805
+ ```python
806
+ from eye.core import EyeClient
807
+
808
+ client = EyeClient("http://localhost:8080")
809
+
810
+ # Continuous monitoring
811
+ while True:
812
+ snapshot = client.get_snapshot()
813
+ # Process snapshot for agent training
814
+ process_for_training(snapshot)
815
+ time.sleep(1.5)
816
+ ```
817
+
818
+ ### Performance Impact
819
+
820
+ - **CPU Overhead**: <3% during capture
821
+ - **Memory Usage**: 50-150 MB (in-memory buffer)
822
+ - **Network Bandwidth**: 0.5-2 MB/s @ 1.5s interval
823
+ - **Capture Latency**: 10-50ms (platform dependent)
824
+ - **Display Performance**: No noticeable impact on Windows GUI
825
+
826
+ ### Configuration Options
827
+
828
+ The Eye service runs automatically when the container starts. Configure via API:
829
+
830
+ ```python
831
+ import requests
832
+
833
+ # Update capture settings
834
+ response = requests.post(
835
+ "http://localhost:8080/admin/config",
836
+ headers={"Authorization": "Bearer your-token"},
837
+ json={
838
+ "interval": 2.0, # Capture every 2 seconds
839
+ "format": "jpeg", # Use JPEG format
840
+ "quality": 85 # 85% quality
841
+ }
842
+ )
843
+ ```
844
+
845
+ **For more details**, Refer to The Eye documentation: https://github.com/nullvoider07/the-eyes
846
+
847
+ ---
848
+
849
+ ## Task Executor API
850
+
851
+ ### Overview
852
+
853
+ The Task Executor (`task_executor_windows.py`, port 9090) is the evaluation harness for frontier coding agents running on the Windows environment. It provides a REST API for submitting coding tasks, running test suites inside isolated workspaces, optionally linting the result, capturing the agent's diff, and returning structured scores β€” all without requiring a human operator.
854
+
855
+ Each task lifecycle: clone a repository, check out a base commit, apply the agent's patch, run the test command, lint (optional), capture the diff, score against a reference patch (optional), clean up. Results are retrievable at any time via task ID.
856
+
857
+ **Windows-specific implementation details:**
858
+ - Task workspace root: `C:\Users\AgentUser\tasks\`
859
+ - Process tree termination on timeout: `taskkill /F /T /PID` β€” terminates all child processes, the Windows equivalent of POSIX `SIGKILL` on a process group
860
+ - All git operations use list-form args (no shell interpolation) to prevent command injection
861
+ - `test_command` and `lint_command` run with `shell=True` inside the container, which is expected for Windows command strings
862
+
863
+ ---
864
+
865
+ ### Starting the Task Executor
866
+
867
+ Start the executor from PowerShell inside the container (via RDP or SSH):
868
+
869
+ ```powershell
870
+ # With auth token and custom port
871
+ $env:API_TOKEN = "your-secret-token"
872
+ $env:API_PORT = "9090"
873
+ python C:\Users\AgentUser\task_executor_windows.py
874
+ ```
875
+
876
+ Verify from inside the container:
877
+ ```powershell
878
+ Invoke-RestMethod -Uri http://localhost:9090/task/submit `
879
+ -Method Post `
880
+ -Headers @{"Authorization"="Bearer your-secret-token"; "Content-Type"="application/json"} `
881
+ -Body '{"repo_url":"invalid","test_command":"exit 0"}'
882
+ ```
883
+
884
+ Verify from host or remote orchestrator:
885
+ ```bash
886
+ curl -X POST http://your-server-ip:9090/task/submit \
887
+ -H "Authorization: Bearer your-secret-token" \
888
+ -H "Content-Type: application/json" \
889
+ -d '{"repo_url":"invalid","test_command":"exit 0"}'
890
+ ```
891
+
892
+ ---
893
+
894
+ ### Environment Variables
895
+
896
+ | Variable | Default | Description |
897
+ |----------|---------|-------------|
898
+ | `TASK_BASE_DIR` | `C:\Users\AgentUser\tasks` | Root directory for task workspaces and the executor log |
899
+ | `API_PORT` | `9090` | Port the Task Executor binds to |
900
+ | `API_TOKEN` | *(unset)* | Bearer token for all requests; auth disabled when unset |
901
+ | `TASK_MAX_AGE` | `3600` | Seconds after completion before task records are evicted from memory |
902
+
903
+ Set these in the Docker Compose file under `environment:` or export them in the shell before starting the executor.
904
+
905
+ ---
906
+
907
+ ### Authentication
908
+
909
+ When `API_TOKEN` is set, every request must include:
910
+
911
+ ```
912
+ Authorization: Bearer <token>
913
+ ```
914
+
915
+ Requests without a valid token return `401 Unauthorized`. For isolated k8s pods with network-level access control, leave `API_TOKEN` unset to disable auth.
916
+
917
+ ---
918
+
919
+ ### REST API Reference
920
+
921
+ #### POST /task/submit
922
+
923
+ | Field | Type | Required | Description |
924
+ |-------|------|----------|-------------|
925
+ | `repo_url` | string | Yes | Git-clonable URL |
926
+ | `test_command` | string | Yes | Shell command run from repo root |
927
+ | `base_commit` | string | No | Commit/tag/branch to check out (default: `HEAD`) |
928
+ | `patch` | string | No | Unified diff applied via `git apply` |
929
+ | `timeout` | int | No | Seconds before process tree is killed (default: `300`) |
930
+ | `lint_command` | string | No | CLI lint command; result is a soft score only |
931
+ | `capture_diff` | bool | No | Capture `git diff <base_commit>` after tests (default: `false`) |
932
+ | `reference_patch` | string | No | Ground-truth diff for similarity scoring |
933
+
934
+ **Example β€” pytest with lint (PowerShell)**:
935
+ ```powershell
936
+ $body = @{
937
+ repo_url = "https://github.com/psf/requests"
938
+ base_commit = "v2.31.0"
939
+ patch = "<agent unified diff>"
940
+ test_command = "python -m pytest tests -x --tb=short"
941
+ timeout = 300
942
+ lint_command = "ruff check . --output-format json"
943
+ capture_diff = $true
944
+ } | ConvertTo-Json
945
+
946
+ Invoke-RestMethod -Uri http://your-server-ip:9090/task/submit `
947
+ -Method Post `
948
+ -Headers @{"Authorization"="Bearer your-secret-token"; "Content-Type"="application/json"} `
949
+ -Body $body
950
+ ```
951
+
952
+ **Example β€” SWE-bench style with reference patch (curl)**:
953
+ ```bash
954
+ curl -X POST http://your-server-ip:9090/task/submit \
955
+ -H "Authorization: Bearer your-secret-token" \
956
+ -H "Content-Type: application/json" \
957
+ -d '{
958
+ "repo_url": "https://github.com/example/repo",
959
+ "base_commit": "abc123",
960
+ "patch": "<agent patch>",
961
+ "test_command": "python -m pytest tests\\test_feature.py",
962
+ "reference_patch": "<ground truth patch>",
963
+ "capture_diff": true
964
+ }'
965
+ ```
966
+
967
+ Returns `202 Accepted`: `{"task_id": "<uuid>", "status": "pending"}`
968
+
969
+ ---
970
+
971
+ #### GET /task/\<task_id\>
972
+
973
+ Lightweight status poll. Returns `task_id` and `status` only (`pending` β†’ `running` β†’ `completed` | `failed`).
974
+
975
+ ```bash
976
+ curl http://your-server-ip:9090/task/<task_id> \
977
+ -H "Authorization: Bearer your-secret-token"
978
+ ```
979
+
980
+ ---
981
+
982
+ #### GET /task/\<task_id\>/result
983
+
984
+ Returns `202` while running. Returns `200` on completion with the full result record:
985
+
986
+ | Field | Type | Description |
987
+ |-------|------|-------------|
988
+ | `exit_code` | int | Exit code of the test command |
989
+ | `stdout` | string | Combined stdout from all steps |
990
+ | `stderr` | string | Combined stderr from all steps |
991
+ | `tests_passed` | int | Passing test count |
992
+ | `tests_failed` | int | Failing/errored test count |
993
+ | `lint_errors` | int or null | Lint error count; `null` if no `lint_command` |
994
+ | `lint_output` | string or null | Raw linter stdout+stderr |
995
+ | `patch_diff` | string or null | `git diff <base_commit>` output; `null` if not requested |
996
+ | `patch_similarity` | float or null | 0.0–1.0 vs `reference_patch`; `null` if no reference provided |
997
+ | `execution_time` | float | Wall-clock seconds from start to finish |
998
+
999
+ ```bash
1000
+ curl http://your-server-ip:9090/task/<task_id>/result \
1001
+ -H "Authorization: Bearer your-secret-token" | jq .
1002
+ ```
1003
+
1004
+ ---
1005
+
1006
+ #### DELETE /task/\<task_id\>
1007
+
1008
+ Removes the task record from memory. Does not cancel a running task β€” submit with a short `timeout` value to cancel effectively.
1009
+
1010
+ ---
1011
+
1012
+ ### Supported Test Frameworks
1013
+
1014
+ | `test_command` contains | Framework |
1015
+ |------------------------|-----------|
1016
+ | `pytest`, `py.test` | pytest |
1017
+ | `cargo` | cargo test |
1018
+ | `go test` | go test |
1019
+ | `jest`, `npm test`, `yarn test`, `pnpm test` | Jest |
1020
+ | `dotnet` | dotnet test |
1021
+ | `mvn`, `gradle`, `sbt`, `junit` | JUnit/Surefire |
1022
+
1023
+ For unrecognised commands, all parsers are tried in order and the first non-zero result is used.
1024
+
1025
+ ---
1026
+
1027
+ ### Supported Linters (Soft Score)
1028
+
1029
+ | Linter | Language | Example `lint_command` |
1030
+ |--------|----------|------------------------|
1031
+ | `ruff` | Python | `ruff check . --output-format json` |
1032
+ | `flake8` | Python | `flake8 src` |
1033
+ | `mypy` | Python | `mypy src --ignore-missing-imports` |
1034
+ | `pylint` | Python | `pylint src` |
1035
+ | `cargo clippy` | Rust | `cargo clippy -- -D warnings` |
1036
+ | `eslint` | JS/TS | `eslint src --format json` |
1037
+ | `go vet` | Go | `go vet ./...` |
1038
+ | `dotnet build` | C# | `dotnet build --no-restore` |
1039
+
1040
+ Lint results are always soft β€” `lint_errors` is recorded but never changes `status` or `exit_code`. This is consistent with the convention used by SWE-bench, HumanEval, and LiveCodeBench.
1041
+
1042
+ ---
1043
+
1044
+ ### Remote Polling Pattern
1045
+
1046
+ ```python
1047
+ import time, requests
1048
+
1049
+ BASE = "http://your-server-ip:9090"
1050
+ HEADERS = {"Authorization": "Bearer your-secret-token"}
1051
+
1052
+ # Submit
1053
+ r = requests.post(f"{BASE}/task/submit", headers=HEADERS, json={
1054
+ "repo_url": "https://github.com/example/repo",
1055
+ "test_command": "python -m pytest tests -x",
1056
+ "lint_command": "ruff check .",
1057
+ "capture_diff": True,
1058
+ })
1059
+ task_id = r.json()["task_id"]
1060
+
1061
+ # Poll (5s interval is reasonable given Windows boot latency)
1062
+ while True:
1063
+ s = requests.get(f"{BASE}/task/{task_id}", headers=HEADERS).json()
1064
+ if s["status"] not in ("pending", "running"):
1065
+ break
1066
+ time.sleep(5)
1067
+
1068
+ # Retrieve full result
1069
+ result = requests.get(f"{BASE}/task/{task_id}/result", headers=HEADERS).json()
1070
+ print(f"Passed: {result['tests_passed']} Failed: {result['tests_failed']} "
1071
+ f"Lint: {result['lint_errors']} Similarity: {result['patch_similarity']}")
1072
+
1073
+ # Clean up
1074
+ requests.delete(f"{BASE}/task/{task_id}", headers=HEADERS)
1075
+ ```
1076
+
1077
+
1078
+ ---
1079
+
1080
+ ## Remote Access Methods
1081
+
1082
+ ### RDP (Recommended)
1083
+
1084
+ **Primary Remote Access Method**: RDP provides the best performance and native Windows integration.
1085
+
1086
+ #### Why RDP?
1087
+
1088
+ **Performance Benefits**:
1089
+ - Native Windows protocol
1090
+ - Hardware-accelerated rendering
1091
+ - Optimized for Windows GUI
1092
+ - Low latency input handling
1093
+ - Efficient bandwidth usage
1094
+ - Superior video quality
1095
+
1096
+ **Features**:
1097
+ - Full desktop experience
1098
+ - Audio support
1099
+ - Multi-session support
1100
+ - Printer redirection
1101
+ - Drive mapping
1102
+
1103
+ #### Connection Setup
1104
+
1105
+ **From Windows Host**:
1106
+ 1. Press `Win + R`
1107
+ 2. Type `mstsc`
1108
+ 3. Enter: `your-server-ip:3389`
1109
+ 4. Click Connect
1110
+
1111
+ **From Linux Host**:
1112
+ ```bash
1113
+ # Using xfreerdp
1114
+ xfreerdp /v:your-server-ip:3389 /u:AgentUser /smart-sizing
1115
+
1116
+ # Using Remmina (Recommended)
1117
+ remmina
1118
+
1119
+ # Using rdesktop
1120
+ rdesktop your-server-ip:3389
1121
+ ```
1122
+
1123
+ **From macOS Host**:
1124
+ - Download Microsoft Remote Desktop from App Store
1125
+ - Add PC: `your-server-ip:3389`
1126
+ - Connect
1127
+
1128
+ #### Best Practices
1129
+
1130
+ **For Best Performance**:
1131
+ - Use wired network connection when possible
1132
+ - Close unused applications in the container
1133
+ - Disable unnecessary visual effects in Windows settings
1134
+ - Use RemoteFX for enhanced graphics (if supported)
1135
+
1136
+ **Network Requirements**:
1137
+ - Minimum: 10 Mbps
1138
+ - Recommended: 100 Mbps+
1139
+ - Latency: <50ms for best experience
1140
+
1141
+ #### Use Cases
1142
+
1143
+ **Primary Development**:
1144
+ - Extended coding sessions
1145
+ - Full IDE usage (VS Code, Visual Studio)
1146
+ - Multi-window workflows
1147
+ - Windows application development
1148
+
1149
+ **Testing & Debugging**:
1150
+ - Interactive debugging
1151
+ - Visual testing
1152
+ - GUI automation development
1153
+ - Screen recording
1154
+
1155
+ ### SSH Access
1156
+
1157
+ **Port**: 2222
1158
+
1159
+ #### Connection
1160
+
1161
+ ```bash
1162
+ # Basic SSH connection
1163
+ ssh -p 2222 AgentUser@your-server-ip
1164
+
1165
+ # With key authentication
1166
+ ssh -i ~/.ssh/id_rsa -p 2222 AgentUser@your-server-ip
1167
+
1168
+ # Port forwarding example
1169
+ ssh -L 8080:localhost:8080 -p 2222 AgentUser@your-server-ip
1170
+ ```
1171
+
1172
+ #### Use Cases
1173
+
1174
+ **Command-Line Operations**:
1175
+ - PowerShell script execution
1176
+ - Package installation
1177
+ - System administration
1178
+ - Log viewing
1179
+
1180
+ **File Transfer**:
1181
+ ```bash
1182
+ # Copy files to container
1183
+ scp -P 2222 file.txt AgentUser@your-server-ip:C:\Users\AgentUser\
1184
+
1185
+ # Copy files from container
1186
+ scp -P 2222 AgentUser@your-server-ip:C:\file.txt ./
1187
+
1188
+ # Using rsync (with WSL or Cygwin)
1189
+ rsync -avz -e "ssh -p 2222" ./local-dir AgentUser@your-server-ip:/cygdrive/c/remote-dir
1190
+ ```
1191
+
1192
+ **Remote Script Execution**:
1193
+ ```bash
1194
+ # Execute single command
1195
+ ssh -p 2222 AgentUser@your-server-ip "python script.py"
1196
+
1197
+ # Execute PowerShell script
1198
+ ssh -p 2222 AgentUser@your-server-ip "powershell -File script.ps1"
1199
+ ```
1200
+
1201
+ ---
1202
+
1203
+ ## Troubleshooting
1204
+
1205
+ ### Common Issues
1206
+
1207
+ #### 1. Windows Update Interference
1208
+
1209
+ **Symptoms**:
1210
+ - Unexpected reboots
1211
+ - Performance degradation during updates
1212
+ - Services stopped after boot
1213
+
1214
+ **Solutions**:
1215
+
1216
+ **Disable Automatic Updates**:
1217
+ ```powershell
1218
+ # Open PowerShell as Administrator
1219
+ Set-ItemProperty -Path "HKLM:\SOFTWARE\Policies\Microsoft\Windows\WindowsUpdate\AU" -Name "NoAutoUpdate" -Value 1
1220
+
1221
+ # Or use Services.msc
1222
+ # Disable "Windows Update" service
1223
+ ```
1224
+
1225
+ #### 2. Slow Performance
1226
+
1227
+ **Symptoms**:
1228
+ - Lag during window operations
1229
+ - Slow application response
1230
+ - High CPU usage
1231
+
1232
+ **Cause**:
1233
+ - Windows background services
1234
+
1235
+ **Solutions**:
1236
+
1237
+ **Option 1: Optimize Visual Effects**:
1238
+ 1. Open System Properties (Win + Pause)
1239
+ 2. Advanced system settings β†’ Performance Settings
1240
+ 3. Select "Adjust for best performance"
1241
+ 4. Or manually disable animations
1242
+
1243
+ **Option 2: Disable Background Services**:
1244
+ ```powershell
1245
+ # Disable Windows Search
1246
+ Stop-Service -Name "WSearch" -Force
1247
+ Set-Service -Name "WSearch" -StartupType Disabled
1248
+
1249
+ # Disable Superfetch
1250
+ Stop-Service -Name "SysMain" -Force
1251
+ Set-Service -Name "SysMain" -StartupType Disabled
1252
+ ```
1253
+
1254
+ **Note: Do not disable services and scheduled task for AutoHotKey and the-eyes tool. They are essential for actuation and capture for CUA.**
1255
+
1256
+ **Option 3: Configuration Adjustment** (Advanced):
1257
+ - Configuration can be customized for better performance
1258
+ - Requires understanding of system limits and testing
1259
+
1260
+ #### 3. Container Won't Start
1261
+
1262
+ **Symptoms**:
1263
+ - Container exits immediately after start
1264
+ - Error messages in logs
1265
+ - Container status shows "Exited"
1266
+
1267
+ **Diagnostic Steps**:
1268
+ ```bash
1269
+ # Check container logs
1270
+ docker logs win_agent
1271
+
1272
+ # Check container status
1273
+ docker ps -a | grep win_agent
1274
+
1275
+ # Inspect container
1276
+ docker inspect win_agent
1277
+ ```
1278
+
1279
+ **Common Solutions**:
1280
+
1281
+ **KVM Not Available**:
1282
+ ```bash
1283
+ # Verify KVM is accessible
1284
+ ls -l /dev/kvm
1285
+
1286
+ # Check if you're in kvm group
1287
+ groups | grep kvm
1288
+
1289
+ # Add user to kvm group if missing
1290
+ sudo usermod -aG kvm $USER
1291
+ # Log out and back in
1292
+ ```
1293
+
1294
+ **Insufficient Resources**:
1295
+ ```bash
1296
+ # Check available RAM
1297
+ free -h
1298
+
1299
+ # Check disk space
1300
+ df -h
1301
+
1302
+ # Verify at least 4GB RAM available
1303
+ ```
1304
+
1305
+ **Port Conflicts**:
1306
+ ```bash
1307
+ # Check if ports are already in use
1308
+ sudo netstat -tlnp | grep -E '3389|4000|8080|9090|2222'
1309
+
1310
+ # Stop conflicting services or change ports in docker-compose.yaml
1311
+ ```
1312
+
1313
+ #### 4. Remote Access Connection Issues
1314
+
1315
+ **RDP Won't Connect**:
1316
+ ```bash
1317
+ # Verify port is exposed
1318
+ docker port win_agent 3389
1319
+
1320
+ # Check if service is listening
1321
+ docker exec win_agent netstat -an | findstr 3389
1322
+
1323
+ # Test connectivity from host
1324
+ telnet localhost 3389
1325
+ ```
1326
+
1327
+ **SSH Connection Refused**:
1328
+ ```bash
1329
+ # Check SSH port mapping
1330
+ docker port win_agent 2222
1331
+
1332
+ # Verify SSH service
1333
+ docker exec win_agent powershell "Get-Service sshd"
1334
+ ```
1335
+
1336
+ #### 5. Windows-Specific Issues
1337
+
1338
+ **Standard Windows Troubleshooting Applies**:
1339
+
1340
+ Most Windows-related issues can be resolved using standard Windows troubleshooting methods:
1341
+
1342
+ 1. **System Settings Reset**:
1343
+ - Open Settings
1344
+ - Reset specific settings causing issues
1345
+ - Restart affected applications
1346
+
1347
+ 2. **Application Issues**:
1348
+ - Use Task Manager to end unresponsive programs
1349
+ - Clear application caches
1350
+
1351
+ 3. **Disk Issues**:
1352
+ - Run `chkdsk`
1353
+ - Check available storage space
1354
+ - Defragment if needed (though SSD doesn't need it)
1355
+
1356
+ 4. **Permission Issues**:
1357
+ - Run applications as Administrator
1358
+ - Check file/folder permissions
1359
+ - Use `icacls` to fix permissions
1360
+
1361
+ **These are standard Windows issues, not container-specific problems**.
1362
+
1363
+ ### Getting Help
1364
+
1365
+ If you encounter issues not covered here:
1366
+
1367
+ 1. **Check container logs**: `docker logs win_agent`
1368
+ 2. **Review system resources**: Ensure minimum requirements are met
1369
+ 3. **Verify KVM access**: Confirm `/dev/kvm` is accessible
1370
+ 4. **Test connectivity**: Check network and port accessibility
1371
+ 5. **See Reporting Issues section** for how to get support
1372
+
1373
+ ---
1374
+
1375
+ ## CI/CD Integration
1376
+
1377
+ The Windows container is designed for seamless integration into CI/CD pipelines, particularly for Computer Use Agent development and deployment.
1378
+
1379
+ ### Supported Platforms
1380
+
1381
+ **Container Orchestration**:
1382
+ - βœ… **Docker** - Native Docker deployment
1383
+ - βœ… **Kubernetes** - K8s pod deployment
1384
+ - βœ… **Docker Compose** - Multi-container orchestration
1385
+ - βœ… **Docker Swarm** - Swarm service deployment
1386
+
1387
+ **CI/CD Systems**:
1388
+ - GitHub Actions
1389
+ - GitLab CI/CD
1390
+ - Jenkins
1391
+ - CircleCI
1392
+ - Travis CI
1393
+ - Azure DevOps
1394
+ - Any system supporting Docker
1395
+
1396
+ ### Docker-Based CI/CD
1397
+
1398
+ #### GitHub Actions Example
1399
+
1400
+ ```yaml
1401
+ name: Windows Tests
1402
+
1403
+ on: [push, pull_request]
1404
+
1405
+ jobs:
1406
+ test:
1407
+ runs-on: ubuntu-latest
1408
+
1409
+ steps:
1410
+ - uses: actions/checkout@v3
1411
+
1412
+ - name: Set up KVM
1413
+ run: |
1414
+ sudo apt-get update
1415
+ sudo apt-get install -y qemu-kvm libvirt-daemon-system
1416
+ sudo usermod -aG kvm $USER
1417
+
1418
+ - name: Start Windows Container
1419
+ run: |
1420
+ docker compose -f deploy-windows.yaml up -d
1421
+ sleep 25 # Wait for boot
1422
+
1423
+ - name: Run Tests
1424
+ run: |
1425
+ docker exec win_agent powershell -File tests/test_agent.ps1
1426
+
1427
+ - name: Cleanup
1428
+ if: always()
1429
+ run: docker compose -f deploy-windows.yaml down
1430
+ ```
1431
+
1432
+ #### GitLab CI Example
1433
+
1434
+ ```yaml
1435
+ stages:
1436
+ - test
1437
+
1438
+ windows_tests:
1439
+ stage: test
1440
+ image: docker:latest
1441
+ services:
1442
+ - docker:dind
1443
+ variables:
1444
+ DOCKER_DRIVER: overlay2
1445
+ before_script:
1446
+ - docker info
1447
+ script:
1448
+ - docker compose -f deploy-windows.yaml up -d
1449
+ - sleep 25
1450
+ - docker exec win_agent powershell -File tests/test_agent.ps1
1451
+ after_script:
1452
+ - docker compose -f deploy-windows.yaml down
1453
+ tags:
1454
+ - kvm
1455
+ ```
1456
+
1457
+ ### Kubernetes Deployment
1458
+
1459
+ #### Pod Specification
1460
+
1461
+ ```yaml
1462
+ apiVersion: v1
1463
+ kind: Pod
1464
+ metadata:
1465
+ name: windows-agent
1466
+ labels:
1467
+ app: windows
1468
+ spec:
1469
+ containers:
1470
+ - name: win-agent
1471
+ image: nullvoider/win11-base:v1
1472
+ ports:
1473
+ - containerPort: 3389
1474
+ name: rdp
1475
+ - containerPort: 4444
1476
+ name: I/O
1477
+ - containerPort: 8080
1478
+ name: eye-server
1479
+ - containerPort: 9090
1480
+ name: task-executor
1481
+ - containerPort: 2222
1482
+ name: ssh
1483
+ securityContext:
1484
+ capabilities:
1485
+ add:
1486
+ - NET_ADMIN
1487
+ volumeMounts:
1488
+ - name: kvm
1489
+ mountPath: /dev/kvm
1490
+ volumes:
1491
+ - name: kvm
1492
+ hostPath:
1493
+ path: /dev/kvm
1494
+ type: CharDevice
1495
+ restartPolicy: Always
1496
+ ```
1497
+
1498
+ #### Deployment with Service
1499
+
1500
+ ```yaml
1501
+ apiVersion: apps/v1
1502
+ kind: Deployment
1503
+ metadata:
1504
+ name: windows-agent-deployment
1505
+ spec:
1506
+ replicas: 1
1507
+ selector:
1508
+ matchLabels:
1509
+ app: windows
1510
+ template:
1511
+ metadata:
1512
+ labels:
1513
+ app: windows
1514
+ spec:
1515
+ containers:
1516
+ - name: win-agent
1517
+ image: nullvoider/win11-base:v1
1518
+ ports:
1519
+ - containerPort: 3389
1520
+ - containerPort: 4444
1521
+ - containerPort: 8080
1522
+ - containerPort: 9090
1523
+ - containerPort: 2222
1524
+ ---
1525
+ apiVersion: v1
1526
+ kind: Service
1527
+ metadata:
1528
+ name: windows-agent-service
1529
+ spec:
1530
+ selector:
1531
+ app: windows
1532
+ ports:
1533
+ - name: rdp
1534
+ port: 3389
1535
+ targetPort: 3389
1536
+ - name: I/O
1537
+ port: 4444
1538
+ targetPort: 4445
1539
+ - name: eye
1540
+ port: 8080
1541
+ targetPort: 8080
1542
+ - name: task-executor
1543
+ port: 9090
1544
+ targetPort: 9090
1545
+ - name: ssh
1546
+ port: 2222
1547
+ targetPort: 2222
1548
+ type: LoadBalancer
1549
+ ```
1550
+
1551
+ ### Use Cases
1552
+
1553
+ **AI Agent Development**:
1554
+ - Automated testing of CUA implementations
1555
+ - Training data collection in reproducible environments
1556
+ - Performance benchmarking
1557
+ - Benchmarking of coding agent capabilities
1558
+ - Integration testing
1559
+
1560
+ **Windows Application Testing**:
1561
+ - Cross-platform application testing
1562
+ - Windows-specific feature validation
1563
+ - GUI automation testing
1564
+ - Compatibility verification
1565
+
1566
+ **Continuous Integration**:
1567
+ - Automated builds on Windows environment
1568
+ - Unit testing with Windows dependencies
1569
+ - Integration testing with Windows services
1570
+ - End-to-end testing workflows
1571
+
1572
+ ### Best Practices
1573
+
1574
+ **Resource Management**:
1575
+ ```yaml
1576
+ # Kubernetes resource limits
1577
+ resources:
1578
+ requests:
1579
+ memory: "4Gi"
1580
+ cpu: "4"
1581
+ limits:
1582
+ memory: "8Gi"
1583
+ cpu: "4"
1584
+ ```
1585
+
1586
+ **Health Checks**:
1587
+ ```yaml
1588
+ # Kubernetes liveness probe
1589
+ livenessProbe:
1590
+ tcpSocket:
1591
+ port: 3389
1592
+ initialDelaySeconds: 120
1593
+ periodSeconds: 30
1594
+ ```
1595
+
1596
+ **Cleanup Strategy**:
1597
+ - Always use `docker compose down` or equivalent cleanup
1598
+ - Implement timeout for long-running tests
1599
+ - Monitor resource usage during CI runs
1600
+ - Use ephemeral runners when possible
1601
+
1602
+ ---
1603
+
1604
+ ## Reporting Issues
1605
+
1606
+ When reporting issues, please provide comprehensive information to help diagnose and resolve problems quickly.
1607
+
1608
+ ### Bug Reports
1609
+
1610
+ **Required Information**:
1611
+
1612
+ 1. **Environment Details**:
1613
+ ```bash
1614
+ # Docker version
1615
+ docker --version
1616
+ docker compose version
1617
+
1618
+ # Host OS information
1619
+ cat /etc/os-release
1620
+ uname -a
1621
+
1622
+ # KVM information
1623
+ kvm-ok
1624
+ ls -l /dev/kvm
1625
+ ```
1626
+
1627
+ 2. **System Resources**:
1628
+ ```bash
1629
+ # Available RAM
1630
+ free -h
1631
+
1632
+ # Disk space
1633
+ df -h
1634
+
1635
+ # CPU information
1636
+ lscpu
1637
+ ```
1638
+
1639
+ 3. **Container Logs**:
1640
+ ```bash
1641
+ # Full container logs
1642
+ docker logs win_agent > container-logs.txt
1643
+
1644
+ # Last 200 lines
1645
+ docker logs --tail 200 win_agent
1646
+
1647
+ # Real-time logs
1648
+ docker logs -f win_agent
1649
+ ```
1650
+
1651
+ 4. **Container Status**:
1652
+ ```bash
1653
+ # Container details
1654
+ docker ps -a | grep win_agent
1655
+ docker inspect win_agent
1656
+
1657
+ # Resource usage
1658
+ docker stats win_agent --no-stream
1659
+ ```
1660
+
1661
+ 5. **Steps to Reproduce**:
1662
+ - Detailed steps to reproduce the issue
1663
+ - Expected behavior
1664
+ - Actual behavior
1665
+ - Screenshots or screen recordings if applicable
1666
+
1667
+ 6. **Configuration**:
1668
+ - Docker Compose file contents
1669
+ - Any custom modifications
1670
+ - Environment variables used
1671
+
1672
+ ### Feature Requests
1673
+
1674
+ **Required Information**:
1675
+
1676
+ 1. **Use Case Description**:
1677
+ - What problem does this feature solve?
1678
+ - Who would benefit from this feature?
1679
+ - How urgent is this feature?
1680
+
1681
+ 2. **Proposed Implementation**:
1682
+ - How should the feature work?
1683
+ - What configuration options should it have?
1684
+ - Any technical considerations?
1685
+
1686
+ 3. **Impact Assessment**:
1687
+ - How would this affect existing functionality?
1688
+ - Resource implications (CPU, RAM, disk)?
1689
+ - Compatibility considerations?
1690
+
1691
+ 4. **Alternatives Considered**:
1692
+ - What alternatives have you considered?
1693
+ - Why is this approach preferred?
1694
+
1695
+ ### Contact Information
1696
+
1697
+ **For Direct Support**:
1698
+ - **X (Formerly Twitter)**: [@nullvoider07](https://x.com/nullvoider07)
1699
+
1700
+ **When Reporting**:
1701
+ - Be specific and detailed
1702
+ - Include all requested information
1703
+ - Attach logs and screenshots
1704
+ - Describe impact and urgency
1705
+
1706
+ ---
1707
+
1708
+ ## Security Considerations
1709
+
1710
+ ### Default Configuration
1711
+
1712
+ - Runs with `NET_ADMIN` capability and KVM device access
1713
+ - Auto-login enabled for development convenience
1714
+ - Remote services (RDP, SSH) with configurable credentials
1715
+ - KVM passthrough requires direct device access
1716
+
1717
+ For production deployments, review the hardening notes below.
1718
+
1719
+ ### Task Executor API Security
1720
+
1721
+ - Set `API_TOKEN` for all non-isolated deployments
1722
+ - Bind port 9090 to localhost when the orchestrator is on the same host:
1723
+ ```yaml
1724
+ ports:
1725
+ - "127.0.0.1:9090:9090"
1726
+ ```
1727
+ - `test_command` and `lint_command` run with `shell=True` β€” ensure the submitting agent or orchestrator is trusted
1728
+ - Access the Task Executor from external networks via SSH tunnel:
1729
+ ```bash
1730
+ ssh -L 9090:localhost:9090 -p 2222 AgentUser@your-server-ip
1731
+ ```
1732
+ Then submit tasks to `http://localhost:9090`
1733
+ - Pass `API_TOKEN` as a k8s Secret β€” never hardcode in Compose files
1734
+
1735
+ ### General Hardening
1736
+
1737
+ 1. Use `NET_ADMIN` capability only
1738
+ 2. Create a dedicated Docker network for agent containers
1739
+ 3. Use environment files or k8s Secrets for all tokens
1740
+ 4. Rebuild the image periodically to incorporate Windows updates
1741
+ 5. Enable Docker json-file logging with rotation
1742
+ 6. Only grant necessary permissions
1743
+
1744
+
1745
+ ---
1746
+
1747
+ ## FAQ
1748
+
1749
+
1750
+ ### Coding Agent Evaluation Questions
1751
+
1752
+ **Q: What is the Task Executor API?**
1753
+ A: It is a REST API (`task_executor_windows.py`) running on port 9090 that provides programmatic task submission, multi-framework test scoring, lint analysis, diff capture, and ground-truth patch similarity scoring. It is the primary eval harness for coding agents running on Windows.
1754
+
1755
+ **Q: How do I start the Task Executor?**
1756
+ A: From PowerShell inside the container (via RDP or SSH): set `API_TOKEN` and `API_PORT` environment variables, then run `python C:\Users\AgentUser\task_executor_windows.py`. See the Task Executor API section for details.
1757
+
1758
+ **Q: Why is lint scoring soft β€” why does it not fail the task?**
1759
+ A: The majority of established coding benchmarks (SWE-bench, HumanEval, LiveCodeBench) use test pass/fail as the primary correctness signal. Lint errors reflect code quality but not functional correctness. Keeping lint soft lets you track quality trends without invalidating otherwise correct solutions.
1760
+
1761
+ **Q: What is patch_similarity and when is it useful?**
1762
+ A: It is a 0.0–1.0 similarity ratio between the agent's actual diff and a ground-truth reference patch, computed after stripping all unified diff metadata. Most useful for patch-apply evals where a canonical solution exists. Always interpret alongside `tests_passed` β€” a lower similarity score does not mean the solution is wrong.
1763
+
1764
+ **Q: Can the Task Executor run tasks in parallel?**
1765
+ A: Yes. Each submitted task runs in an independent background thread with its own isolated workspace under `TASK_BASE_DIR`. For large-scale parallelism, deploy multiple container replicas via k8s β€” each replica maintains its own in-memory task store.
1766
+
1767
+ **Q: What happens if a task times out?**
1768
+ A: The executor runs `taskkill /F /T /PID`, which forcefully terminates the entire process tree rooted at the test process. The task is marked `failed` with the timeout error recorded in `stderr`.
1769
+
1770
+ **Q: How do I access the Task Executor remotely?**
1771
+ A: The Task Executor binds to `0.0.0.0:9090`. In a k8s deployment, expose it via a `ClusterIP` service for internal orchestrator access, or `NodePort`/`LoadBalancer` for external access. Always set `API_TOKEN` when the port is reachable outside a trusted network boundary.
1772
+
1773
+ **Q: In a k8s deployment with many replicas, how does an orchestrator route tasks to a specific container?**
1774
+ A: Each replica runs its own Task Executor with its own in-memory task store. Track the pod IP (or headless service DNS entry) at submission time and send all status/result polls to the same pod. A load-balanced service may route requests to different replicas and return `404 Task not found`.
1775
+
1776
+ **Q: What happens to in-flight tasks if a pod is evicted or restarted?**
1777
+ A: In-flight tasks are lost β€” the in-memory store does not survive a restart. Implement retry logic in your orchestrator and treat `404 Task not found` as a signal to resubmit. The Windows container's ~25-second boot adds latency to recovery; account for this in orchestrator timeout settings.
1778
+
1779
+ **Q: How do I pass API_TOKEN securely across a k8s cluster?**
1780
+ A: Mount it as a k8s `Secret`:
1781
+ ```yaml
1782
+ env:
1783
+ - name: API_TOKEN
1784
+ valueFrom:
1785
+ secretKeyRef:
1786
+ name: task-executor-secret
1787
+ key: api-token
1788
+ ```
1789
+ Never hardcode tokens in the Compose file or Dockerfile.
1790
+
1791
+ ### General Questions
1792
+
1793
+ **Q: How is the entire Windows system running in a single container?**
1794
+ A: This container uses advanced virtualization techniques with KVM acceleration to run a complete Windows system. The implementation has everything self-contained within the container image. The result is a fully functional Windows 11 environment that's completely isolated and ephemeral.
1795
+
1796
+ **Q: Why doesn't this container need external files?**
1797
+ A: The container architecture was designed from the ground up to be self-contained. All necessary components, including the Windows system files, bootloader, and configuration, are embedded within the container image itself. This provides significant advantages: easier deployment, cleaner state management, no external file dependencies, and true ephemeral operation.
1798
+
1799
+ **Q: Can I run multiple instances of this container?**
1800
+ A: Yes, but each instance requires 4GB of RAM. Ensure your host has sufficient resources (e.g., 8GB+ RAM free for 2 instances).
1801
+
1802
+ **Q: How much disk space does it need?**
1803
+ A: The container image requires approximately 100GB of host disk space. The Windows system inside has a 2TB virtual disk.
1804
+
1805
+ **Q: Is this suitable for production use?**
1806
+ A: Yes, it's specifically designed for Computer Use Agent development, coding agents, and deployment in production environments. The container provides a stable, reproducible Windows environment ideal for CI/CD pipelines and automated testing.
1807
+
1808
+ ### Performance Questions
1809
+
1810
+ **Q: Why is the boot time 25 seconds?**
1811
+ A: This includes the complete Windows boot sequence, service initialization, and remote access server setup. This is normal for a full Windows system and is competitive with bare-metal Windows boot times.
1812
+
1813
+ **Q: Can I improve the performance?**
1814
+ A: Yes, the current host CPU configuration can be customized for better performance based on your hardware. The existing configuration prioritizes stability and compatibility. You can adjust the CPU configuration, though this requires testing on your specific hardware.
1815
+
1816
+ **Q: Why does RDP perform better on Windows?**
1817
+ A: RDP is the native Windows remote desktop protocol and is optimized specifically for Windows GUI rendering. It uses hardware acceleration and efficient protocols designed for Windows systems.
1818
+
1819
+ **Q: What's the CPU usage under heavy load?**
1820
+ A: Under normal development workloads (coding, browsing, terminal work), expect 20-30% CPU. Heavy compilation or resource-intensive applications may increase this to 40-50%.
1821
+
1822
+ ### Compatibility Questions
1823
+
1824
+ **Q: Does it work on Windows/macOS hosts?**
1825
+ A: It requires a Linux host with KVM support. Windows (WSL2 with nested virtualization) and macOS hosts are not officially supported due to KVM requirements.
1826
+
1827
+ **Q: What Linux distributions are supported?**
1828
+ A: Any modern Linux distribution with Docker 24.0+ and KVM support:
1829
+ - Ubuntu 20.04+
1830
+ - Debian 11+
1831
+ - Fedora 36+
1832
+ - CentOS 8+
1833
+ - Arch Linux
1834
+
1835
+ **Q: Can I use AMD CPUs?**
1836
+ A: Yes, as long as AMD-V (SVM) is enabled in BIOS and the KVM kernel modules are loaded.
1837
+
1838
+ **Q: What about ARM processors (Apple Silicon)?**
1839
+ A: Not supported. This is an x86_64 container designed for Intel/AMD processors only.
1840
+
1841
+ ### Configuration Questions
1842
+
1843
+ **Q: Can I change the RAM allocation?**
1844
+ A: Yes, but currently the container is configured for 8GB RAM. Changing this requires rebuilding the container image with modified configuration.
1845
+
1846
+ **Q: Can I use this for .NET development?**
1847
+ A: Yes, .NET SDK and Visual Studio Build Tools are pre-installed. The container is optimized for Computer Use Agent and Coding agent development but fully supports .NET workflows.
1848
+
1849
+ **Q: How do I persist data across container restarts?**
1850
+ A: Use Docker volumes to mount directories from the host:
1851
+ ```yaml
1852
+ volumes:
1853
+ - ./my-projects:C:\Users\AgentUser\projects
1854
+ ```
1855
+
1856
+ ### Remote Access Questions
1857
+
1858
+ **Q: Which remote access method should I use?**
1859
+ A: Use RDP for best performance β€” it is the native Windows protocol with hardware acceleration and full clipboard/audio support. Use SSH for headless command-line operations, script execution, and file transfers.
1860
+
1861
+ **Q: Can I use other remote desktop solutions?**
1862
+ A: The container is pre-configured with RDP and VNC. Adding other solutions would require custom configuration.
1863
+
1864
+ **Q: What's the bandwidth requirement for RDP?**
1865
+ A: Minimum 10 Mbps, recommended 100 Mbps+ for best experience. Less bandwidth will work but may impact video quality.
1866
+
1867
+ ### Troubleshooting Questions
1868
+
1869
+ **Q: Windows Updates are interfering. What should I do?**
1870
+ A: Disable automatic updates via Group Policy or Services. See Troubleshooting section for detailed steps.
1871
+
1872
+ **Q: Why is performance slow?**
1873
+ A: The host CPU configuration prioritizes stability. You can disable visual effects, unnecessary services, or customize the CPU configuration for better performance.
1874
+
1875
+ **Q: How do I access container logs?**
1876
+ A:
1877
+ ```bash
1878
+ docker logs win_agent
1879
+ docker logs -f win_agent # Follow mode
1880
+ ```
1881
+
1882
+ **Q: The container won't start. What's wrong?**
1883
+ A: Check:
1884
+ 1. KVM is accessible (`ls -l /dev/kvm`)
1885
+ 2. Sufficient RAM available (8GB free)
1886
+ 3. Ports aren't conflicting
1887
+ 4. Docker service is running
1888
+ 5. Container logs for specific errors
1889
+
1890
+ ### Security Questions
1891
+
1892
+ **Q: Is this container secure?**
1893
+ A: The container runs with NET_ADMIN capability and requires KVM access. It's designed for development environments. For production, review security considerations and implement appropriate network isolation.
1894
+
1895
+ **Q: Can I run this in a public cloud?**
1896
+ A: Only on infrastructure that exposes hardware virtualization extensions to the guest. Bare-metal instances work universally. Standard VM instances require the cloud provider to explicitly enable nested virtualization β€” AWS Nitro, Google Cloud, and Azure support it on select instance types, but it must be enabled per-instance and is not on by default. The limiting factor is the hypervisor configuration, not the host OS.
1897
+
1898
+ **Q: How do I secure remote access?**
1899
+ A: Use VPN or SSH tunneling to access the container:
1900
+ ```bash
1901
+ ssh -L 3389:localhost:3389 -p 2222 host-server
1902
+ ```
1903
+ Then connect RDP to `localhost:3389`.
1904
+
1905
+ ---
1906
+
1907
+ ## License
1908
+
1909
+ This project is licensed under the **GNU General Public License v3.0 (GPL-3.0)**.
1910
+
1911
+ ### What GPL-3.0 Covers
1912
+
1913
+ The GPL-3.0 license applies to:
1914
+ - Container configuration files and Docker Compose setup
1915
+ - Custom scripts and automation tools created by the developer
1916
+ - Integration code and custom components
1917
+ - Documentation and setup instructions
1918
+ - Any modifications you make to these components
1919
+
1920
+ ### GPL-3.0 License Summary
1921
+
1922
+ **Permissions**:
1923
+ - βœ… Commercial use
1924
+ - βœ… Modification
1925
+ - βœ… Distribution
1926
+ - βœ… Patent use
1927
+ - βœ… Private use
1928
+
1929
+ **Conditions**:
1930
+ - πŸ“‹ License and copyright notice
1931
+ - πŸ“‹ State changes
1932
+ - πŸ“‹ Disclose source
1933
+ - πŸ“‹ Same license (copyleft)
1934
+
1935
+ **Limitations**:
1936
+ - ❌ Liability
1937
+ - ❌ Warranty
1938
+
1939
+ ### What This Means
1940
+
1941
+ **For the Container Infrastructure** (GPL-3.0):
1942
+ - You can use, modify, and distribute the container configuration
1943
+ - You can create derivative works of the setup scripts
1944
+ - If you distribute modified versions, you must:
1945
+ - Include the GPL-3.0 license
1946
+ - Make your source code modifications available
1947
+ - License your modifications under GPL-3.0
1948
+ - Document any changes made
1949
+
1950
+ ### Full License
1951
+
1952
+ For the complete license text, see: https://www.gnu.org/licenses/gpl-3.0.en.html
1953
+
1954
+ ### Disclaimer
1955
+
1956
+ This container is provided "as is" without warranty of any kind.
1957
+
1958
+ ---
1959
+
1960
+ ## About This Project
1961
+
1962
+ The **Windows 11 Container** represents a significant advancement in containerized Windows environments. Built for Computer Use Agent development and frontier coding agent evaluation, this project addresses the key challenges faced by developers working with Windows-based automation and AI agents.
1963
+
1964
+ Version 2 extends the original CUA environment into a full coding agent evaluation platform. The Task Executor API β€” covering multi-framework test scoring, programmatic lint integration, diff capture, and ground-truth patch similarity scoring β€” was built to support rigorous coding agent benchmarking on a native Windows runtime, a capability absent from Linux-only eval frameworks.
1965
+
1966
+ ### Project Goals
1967
+
1968
+ **Primary Objectives**:
1969
+ - Provide a reproducible Windows environment for AI coding agents and CUA development
1970
+ - Eliminate external file dependencies for cleaner deployments
1971
+ - Optimize performance while maintaining stability
1972
+ - Enable seamless CI/CD integration for Windows workflows
1973
+ - Support scalable agent training and testing
1974
+
1975
+ **Design Philosophy**:
1976
+ - **Self-Contained**: Everything in one container, no external files
1977
+ - **Ephemeral**: Clean state management with proper isolation
1978
+ - **Performant**: Optimized for real-world development workflows
1979
+ - **Tested**: Based on confirmed safe and stable configurations
1980
+ - **Accessible**: Simple deployment with Docker Compose
1981
+
1982
+ ### Development Journey
1983
+
1984
+ This container was built from the ground up through:
1985
+ - Extensive testing on real hardware
1986
+ - Iterative performance optimization
1987
+ - Configuration tuning for stability
1988
+ - Integration of development tools
1989
+ - Refinement of remote access methods
1990
+
1991
+ Every configuration choice, from the host CPU setting to the 8GB RAM allocation, is based on tested and confirmed performance characteristics. The current configuration represents what can be safely delivered and has been verified to work reliably.
1992
+
1993
+ ### Why This Matters
1994
+
1995
+ **For Developers**:
1996
+ - Consistent Windows environment across team members
1997
+ - No "works on my machine" issues
1998
+ - Fast setup and deployment
1999
+ - Integrated development tools
2000
+ - Built-in monitoring capabilities
2001
+
2002
+ **For Organizations**:
2003
+ - Reproducible testing environments
2004
+ - CI/CD pipeline integration
2005
+ - Scalable agent deployment
2006
+ - Cost-effective Windows access
2007
+ - Clean resource management
2008
+
2009
+ ### Future Direction
2010
+
2011
+ While the current configuration is optimized for compatibility and stability, the container is designed to be customizable. As hardware capabilities evolve and use cases expand, configurations can be adjusted to leverage more powerful systems while maintaining the core benefits of containerization.
2012
+
2013
+ ### Acknowledgments
2014
+
2015
+ This project builds on the containerization ecosystem and the work of many in the Docker and virtualization communities. Special recognition to:
2016
+ - The Docker team for container technology
2017
+ - The KVM project for virtualization
2018
+ - The open-source community for tools and libraries
2019
+
2020
+ ### Get Involved
2021
+
2022
+ **Feedback & Contact**:
2023
+ - **X (Twitter)**: [@nullvoider07](https://x.com/nullvoider07)
2024
+ - Report issues with detailed information
2025
+ - Share your use cases and experiences
2026
+ - Suggest improvements and features
2027
+
2028
+ **Contributing**:
2029
+ The core implementation details is open-source, feedback on the mentioned topics and other topics not in the list:
2030
+ - Performance optimization suggestions
2031
+ - Use case requirements
2032
+ - Bug reports and fixes
2033
+ - Documentation improvements
2034
+
2035
+ **Key files:**
2036
+ - `task_executor_windows.py` β€” Task Executor REST API server
2037
+ - `deploy-windows.yaml` β€” Docker Compose deployment file
2038
+ - `README.md` β€” This documentation
2039
+
2040
+ ...is always welcome and appreciated.
2041
+
2042
+ ---
2043
+
2044
+ **Last Updated:** May 2026
2045
+ **Version:** 1
2046
+ **Developer:** Kartik (NullVoider)
2047
+ **License:** GPL-3.0
2048
+
2049
+ ---
2050
+
2051
+ **Windows 11** - Full Windows in one self-contained container. AI agent training and evaluation, no compromises, no external files. πŸš€