| # UI Element Detection API |
|
|
| Complete server-based solution for detecting and locating all UI elements in screenshots using OmniParser and template matching. |
|
|
| ## Features |
|
|
| β
**Automatic UI Detection** - Uses OmniParser to detect all UI elements (buttons, text, icons, etc.) |
| β
**Precise Coordinates** - Returns pixel-perfect coordinates for each element |
| β
**Multiple Export Formats** - JSON, CSV, and visualization PNG |
| β
**Fast Processing** - ~15 seconds per screenshot on CPU |
| β
**Server-Side Storage** - Cropped images stored on server, not sent to clients |
| β
**Multiple Endpoints** - Flexible request/response options |
|
|
| ## Start the Server |
|
|
| ```bash |
| cd /workspaces/omoi-v2 |
| python ui_element_api_server.py --port 8001 |
| ``` |
|
|
| Server will start at `http://127.0.0.1:8001` |
|
|
| ## API Endpoints |
|
|
| ### 1. Health Check |
| ```bash |
| GET /health |
| ``` |
|
|
| **Response:** |
| ```json |
| {"status": "ok", "service": "UI Element Detection API"} |
| ``` |
|
|
| ### 2. Analyze Image (Full Response) |
| ```bash |
| POST /analyze |
| Content-Type: multipart/form-data |
| |
| file: <PNG image file> |
| ``` |
|
|
| **Response:** |
| ```json |
| { |
| "status": "success", |
| "processing_time_seconds": 15.4, |
| "timing": { |
| "omniparser_seconds": 9.88, |
| "template_matching_seconds": 5.48 |
| }, |
| "image_info": { |
| "filename": "Screenshot.png", |
| "size": {"width": 1365, "height": 767} |
| }, |
| "analysis": { |
| "total_elements_detected": 120, |
| "elements": [ |
| { |
| "template_id": "crop_0000", |
| "template_file": "crop_0000.png", |
| "confidence": 1.0, |
| "bbox": { |
| "x1": 71, "y1": 13, "x2": 161, "y2": 29, |
| "width": 90, "height": 16 |
| }, |
| "center": {"x": 116, "y": 21}, |
| "bbox_ratio": { |
| "x1": 0.052, "y1": 0.017, "x2": 0.118, "y2": 0.038 |
| } |
| }, |
| // ... 119 more elements |
| ] |
| }, |
| "exports": { |
| "csv_data": "Element_ID,Template_File,Confidence,X1,Y1,...\n", |
| "visualization_png_base64": "iVBORw0KGgoAAAANSUhEUgAAA..." |
| } |
| } |
| ``` |
|
|
| ### 3. Analyze Image (Structured Response) |
| ```bash |
| POST /analyze_batch |
| Content-Type: multipart/form-data |
| |
| file: <PNG image file> |
| ``` |
|
|
| **Response:** |
| ```json |
| { |
| "metadata": { |
| "filename": "Screenshot.png", |
| "image_size": {"width": 1365, "height": 767}, |
| "total_elements_detected": 120, |
| "templates_loaded": 120 |
| }, |
| "coordinates_json": { |
| "source_image": "Screenshot.png", |
| "image_size": {"width": 1365, "height": 767}, |
| "total_elements": 120, |
| "elements": [...] |
| }, |
| "csv_data": "Element_ID,Template_File,...\n", |
| "visualization_png_base64": "iVBORw0KGgo..." |
| } |
| ``` |
|
|
| ## Usage Examples |
|
|
| ### Python Client |
|
|
| ```python |
| from ui_element_client import UIElementDetectionClient |
| |
| # Initialize client |
| client = UIElementDetectionClient(api_url="http://127.0.0.1:8001") |
| |
| # Check API health |
| status = client.health_check() |
| print(status) |
| |
| # Analyze image and get all elements |
| result = client.analyze_image("screenshot.png") |
| print(f"Found {result['analysis']['total_elements_detected']} UI elements") |
| |
| # Get specific element |
| element = client.get_element_by_id("screenshot.png", "crop_0031") |
| print(f"Element at: ({element['center']['x']}, {element['center']['y']})") |
| |
| # Find elements in a region (top 100 pixels) |
| elements = client.find_elements_in_region("screenshot.png", 0, 0, 1365, 100) |
| print(f"Found {len(elements)} elements in top region") |
| ``` |
|
|
| ### Using curl |
|
|
| #### Analyze image and save outputs |
| ```bash |
| curl -X POST -F "file=@screenshot.png" http://127.0.0.1:8001/analyze > response.json |
| |
| # Extract CSV data |
| python -c "import json; d=json.load(open('response.json')); print(d['exports']['csv_data'])" > coordinates.csv |
| |
| # Extract visualization (base64 decode) |
| python -c " |
| import json, base64 |
| d = json.load(open('response.json')) |
| with open('visualization.png', 'wb') as f: |
| f.write(base64.b64decode(d['exports']['visualization_png_base64'])) |
| " |
| ``` |
|
|
| ### JavaScript/Node.js |
|
|
| ```javascript |
| const FormData = require('form-data'); |
| const fs = require('fs'); |
| const axios = require('axios'); |
| |
| async function analyzeImage(imagePath) { |
| const formData = new FormData(); |
| formData.append('file', fs.createReadStream(imagePath)); |
| |
| const response = await axios.post( |
| 'http://127.0.0.1:8001/analyze', |
| formData, |
| { headers: formData.getHeaders() } |
| ); |
| |
| const data = response.data; |
| console.log(`Found ${data.analysis.total_elements_detected} UI elements`); |
| |
| // Save CSV |
| fs.writeFileSync('coordinates.csv', data.exports.csv_data); |
| |
| // Save visualization |
| const vizBuffer = Buffer.from(data.exports.visualization_png_base64, 'base64'); |
| fs.writeFileSync('visualization.png', vizBuffer); |
| |
| return data; |
| } |
| |
| analyzeImage('screenshot.png').catch(console.error); |
| ``` |
|
|
| ## Response Data Structure |
|
|
| Each UI element contains: |
|
|
| ```json |
| { |
| "template_id": "crop_0031", // Element identifier |
| "template_file": "crop_0031.png", // Source template file |
| "confidence": 1.0, // Matching confidence (0-1) |
| "bbox": { |
| "x1": 587, // Top-left X |
| "y1": 393, // Top-left Y |
| "x2": 763, // Bottom-right X |
| "y2": 441, // Bottom-right Y |
| "width": 176, // Element width |
| "height": 48 // Element height |
| }, |
| "center": { |
| "x": 675, // Center X (for clicking) |
| "y": 417 // Center Y (for clicking) |
| }, |
| "bbox_ratio": { |
| "x1": 0.430, // Normalized X1 (0-1) |
| "y1": 0.512, // Normalized Y1 (0-1) |
| "x2": 0.559, // Normalized X2 (0-1) |
| "y2": 0.575 // Normalized Y2 (0-1) |
| } |
| } |
| ``` |
|
|
| ## Export Formats |
|
|
| ### JSON |
| Complete structured data with all coordinates, confidence scores, and metadata. |
|
|
| ### CSV |
| Spreadsheet-friendly format with columns: |
| - Element_ID |
| - Template_File |
| - Confidence |
| - X1, Y1, X2, Y2 (pixel coordinates) |
| - Width, Height |
| - Center_X, Center_Y |
| - Ratio_X1, Ratio_Y1, Ratio_X2, Ratio_Y2 |
|
|
| ### Visualization PNG |
| High-resolution image with: |
| - Green bounding boxes around each element |
| - Red center point marker |
| - Element ID and confidence label for each box |
|
|
| ## Server-Side File Storage |
|
|
| The server maintains a temporary cropped images directory: |
| ``` |
| /tmp/omoi_cropped_images/ |
| βββ crop_0000.png |
| βββ crop_0001.png |
| βββ crop_0002.png |
| βββ ... (120+ images) |
| ``` |
|
|
| These files are: |
| - β
Used for template matching |
| - β
Kept on server for reference |
| - β NOT sent to clients |
| - β Cleared on server restart |
|
|
| ## Performance |
|
|
| Typical performance on CPU: |
| - OmniParser detection: ~10 seconds |
| - Template matching: ~5 seconds |
| - Total: ~15 seconds per screenshot |
|
|
| ## Architecture |
|
|
| ``` |
| Client Request (PNG) |
| β |
| [API Server] |
| 1. Receives PNG |
| 2. Runs OmniParser |
| ββ Detects UI elements |
| ββ Saves cropped images (server-side only) |
| 3. Template matches crops back to original |
| 4. Generates coordinates |
| 5. Creates visualization |
| 6. Exports to JSON/CSV |
| β |
| Client Response (JSON, CSV, PNG) |
| - Coordinates metadata |
| - CSV data |
| - Visualization image |
| (NO cropped images to client) |
| ``` |
|
|
| ## Coordinate Systems |
|
|
| ### Absolute Coordinates |
| Pixel coordinates in the original image: |
| - `bbox.x1, bbox.y1`: Top-left corner |
| - `bbox.x2, bbox.y2`: Bottom-right corner |
| - `center.x, center.y`: Center point (use for mouse clicks) |
|
|
| ### Normalized Coordinates |
| 0-1 scale for responsive designs: |
| - `bbox_ratio.x1, bbox_ratio.y1`: Top-left (normalized) |
| - `bbox_ratio.x2, bbox_ratio.y2`: Bottom-right (normalized) |
| - Useful for scaling to different screen sizes |
|
|
| ## Tips |
|
|
| 1. **Clicking Elements**: Use `center.x` and `center.y` for mouse position |
| 2. **Validation**: All elements have `confidence: 1.0` (perfect match) |
| 3. **Filtering**: Use `bbox_ratio` for responsive element filtering |
| 4. **Region Queries**: Client library supports finding elements in bounding boxes |
| 5. **Batch Processing**: Queue multiple images for analysis |
|
|
| ## Troubleshooting |
|
|
| **"OmniParser not initialized"** - Server failed to load models, check logs |
|
|
| **"Failed to decode image"** - Ensure you're uploading valid PNG/JPG files |
|
|
| **"Cropped images directory not found"** - OmniParser detection failed, check input image |
|
|
| **Timeout** - Processing large images takes time, increase request timeout |
|
|
| ## Files |
|
|
| - `ui_element_api_server.py` - Main API server |
| - `ui_element_client.py` - Python client library |
| - `ui_element_locator.py` - Template matching utility |
| - `ui_element_analyzer.py` - Analysis and export utilities |
|
|
| --- |
|
|
| **Status**: β
Production Ready |
| **Last Updated**: April 17, 2026 |
|
|