omoi-ui-detector / API_DOCUMENTATION.md
makeitfr's picture
Upload API_DOCUMENTATION.md with huggingface_hub
b4cc14b verified
|
Raw
History Blame Contribute Delete
8.64 kB

UI Element Detection API

Complete server-based solution for detecting and locating all UI elements in screenshots using OmniParser and template matching.

Features

βœ… Automatic UI Detection - Uses OmniParser to detect all UI elements (buttons, text, icons, etc.) βœ… Precise Coordinates - Returns pixel-perfect coordinates for each element
βœ… Multiple Export Formats - JSON, CSV, and visualization PNG βœ… Fast Processing - ~15 seconds per screenshot on CPU βœ… Server-Side Storage - Cropped images stored on server, not sent to clients βœ… Multiple Endpoints - Flexible request/response options

Start the Server

cd /workspaces/omoi-v2
python ui_element_api_server.py --port 8001

Server will start at http://127.0.0.1:8001

API Endpoints

1. Health Check

GET /health

Response:

{"status": "ok", "service": "UI Element Detection API"}

2. Analyze Image (Full Response)

POST /analyze
Content-Type: multipart/form-data

file: <PNG image file>

Response:

{
  "status": "success",
  "processing_time_seconds": 15.4,
  "timing": {
    "omniparser_seconds": 9.88,
    "template_matching_seconds": 5.48
  },
  "image_info": {
    "filename": "Screenshot.png",
    "size": {"width": 1365, "height": 767}
  },
  "analysis": {
    "total_elements_detected": 120,
    "elements": [
      {
        "template_id": "crop_0000",
        "template_file": "crop_0000.png",
        "confidence": 1.0,
        "bbox": {
          "x1": 71, "y1": 13, "x2": 161, "y2": 29,
          "width": 90, "height": 16
        },
        "center": {"x": 116, "y": 21},
        "bbox_ratio": {
          "x1": 0.052, "y1": 0.017, "x2": 0.118, "y2": 0.038
        }
      },
      // ... 119 more elements
    ]
  },
  "exports": {
    "csv_data": "Element_ID,Template_File,Confidence,X1,Y1,...\n",
    "visualization_png_base64": "iVBORw0KGgoAAAANSUhEUgAAA..."
  }
}

3. Analyze Image (Structured Response)

POST /analyze_batch
Content-Type: multipart/form-data

file: <PNG image file>

Response:

{
  "metadata": {
    "filename": "Screenshot.png",
    "image_size": {"width": 1365, "height": 767},
    "total_elements_detected": 120,
    "templates_loaded": 120
  },
  "coordinates_json": {
    "source_image": "Screenshot.png",
    "image_size": {"width": 1365, "height": 767},
    "total_elements": 120,
    "elements": [...]
  },
  "csv_data": "Element_ID,Template_File,...\n",
  "visualization_png_base64": "iVBORw0KGgo..."
}

Usage Examples

Python Client

from ui_element_client import UIElementDetectionClient

# Initialize client
client = UIElementDetectionClient(api_url="http://127.0.0.1:8001")

# Check API health
status = client.health_check()
print(status)

# Analyze image and get all elements
result = client.analyze_image("screenshot.png")
print(f"Found {result['analysis']['total_elements_detected']} UI elements")

# Get specific element
element = client.get_element_by_id("screenshot.png", "crop_0031")
print(f"Element at: ({element['center']['x']}, {element['center']['y']})")

# Find elements in a region (top 100 pixels)
elements = client.find_elements_in_region("screenshot.png", 0, 0, 1365, 100)
print(f"Found {len(elements)} elements in top region")

Using curl

Analyze image and save outputs

curl -X POST -F "file=@screenshot.png" http://127.0.0.1:8001/analyze > response.json

# Extract CSV data
python -c "import json; d=json.load(open('response.json')); print(d['exports']['csv_data'])" > coordinates.csv

# Extract visualization (base64 decode)
python -c "
import json, base64
d = json.load(open('response.json'))
with open('visualization.png', 'wb') as f:
    f.write(base64.b64decode(d['exports']['visualization_png_base64']))
"

JavaScript/Node.js

const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');

async function analyzeImage(imagePath) {
  const formData = new FormData();
  formData.append('file', fs.createReadStream(imagePath));
  
  const response = await axios.post(
    'http://127.0.0.1:8001/analyze',
    formData,
    { headers: formData.getHeaders() }
  );
  
  const data = response.data;
  console.log(`Found ${data.analysis.total_elements_detected} UI elements`);
  
  // Save CSV
  fs.writeFileSync('coordinates.csv', data.exports.csv_data);
  
  // Save visualization
  const vizBuffer = Buffer.from(data.exports.visualization_png_base64, 'base64');
  fs.writeFileSync('visualization.png', vizBuffer);
  
  return data;
}

analyzeImage('screenshot.png').catch(console.error);

Response Data Structure

Each UI element contains:

{
  "template_id": "crop_0031",           // Element identifier
  "template_file": "crop_0031.png",     // Source template file
  "confidence": 1.0,                     // Matching confidence (0-1)
  "bbox": {
    "x1": 587,                           // Top-left X
    "y1": 393,                           // Top-left Y
    "x2": 763,                           // Bottom-right X
    "y2": 441,                           // Bottom-right Y
    "width": 176,                        // Element width
    "height": 48                         // Element height
  },
  "center": {
    "x": 675,                            // Center X (for clicking)
    "y": 417                             // Center Y (for clicking)
  },
  "bbox_ratio": {
    "x1": 0.430,                         // Normalized X1 (0-1)
    "y1": 0.512,                         // Normalized Y1 (0-1)
    "x2": 0.559,                         // Normalized X2 (0-1)
    "y2": 0.575                          // Normalized Y2 (0-1)
  }
}

Export Formats

JSON

Complete structured data with all coordinates, confidence scores, and metadata.

CSV

Spreadsheet-friendly format with columns:

  • Element_ID
  • Template_File
  • Confidence
  • X1, Y1, X2, Y2 (pixel coordinates)
  • Width, Height
  • Center_X, Center_Y
  • Ratio_X1, Ratio_Y1, Ratio_X2, Ratio_Y2

Visualization PNG

High-resolution image with:

  • Green bounding boxes around each element
  • Red center point marker
  • Element ID and confidence label for each box

Server-Side File Storage

The server maintains a temporary cropped images directory:

/tmp/omoi_cropped_images/
  β”œβ”€β”€ crop_0000.png
  β”œβ”€β”€ crop_0001.png
  β”œβ”€β”€ crop_0002.png
  └── ... (120+ images)

These files are:

  • βœ… Used for template matching
  • βœ… Kept on server for reference
  • ❌ NOT sent to clients
  • ❌ Cleared on server restart

Performance

Typical performance on CPU:

  • OmniParser detection: ~10 seconds
  • Template matching: ~5 seconds
  • Total: ~15 seconds per screenshot

Architecture

Client Request (PNG)
        ↓
[API Server]
  1. Receives PNG
  2. Runs OmniParser
     β”œβ”€ Detects UI elements
     └─ Saves cropped images (server-side only)
  3. Template matches crops back to original
  4. Generates coordinates
  5. Creates visualization
  6. Exports to JSON/CSV
        ↓
Client Response (JSON, CSV, PNG)
  - Coordinates metadata
  - CSV data
  - Visualization image
  (NO cropped images to client)

Coordinate Systems

Absolute Coordinates

Pixel coordinates in the original image:

  • bbox.x1, bbox.y1: Top-left corner
  • bbox.x2, bbox.y2: Bottom-right corner
  • center.x, center.y: Center point (use for mouse clicks)

Normalized Coordinates

0-1 scale for responsive designs:

  • bbox_ratio.x1, bbox_ratio.y1: Top-left (normalized)
  • bbox_ratio.x2, bbox_ratio.y2: Bottom-right (normalized)
  • Useful for scaling to different screen sizes

Tips

  1. Clicking Elements: Use center.x and center.y for mouse position
  2. Validation: All elements have confidence: 1.0 (perfect match)
  3. Filtering: Use bbox_ratio for responsive element filtering
  4. Region Queries: Client library supports finding elements in bounding boxes
  5. Batch Processing: Queue multiple images for analysis

Troubleshooting

"OmniParser not initialized" - Server failed to load models, check logs

"Failed to decode image" - Ensure you're uploading valid PNG/JPG files

"Cropped images directory not found" - OmniParser detection failed, check input image

Timeout - Processing large images takes time, increase request timeout

Files

  • ui_element_api_server.py - Main API server
  • ui_element_client.py - Python client library
  • ui_element_locator.py - Template matching utility
  • ui_element_analyzer.py - Analysis and export utilities

Status: βœ… Production Ready Last Updated: April 17, 2026