# UI Element Detection API

Complete server-based solution for detecting and locating all UI elements in screenshots using OmniParser and template matching.

## Features

✅ **Automatic UI Detection** - Uses OmniParser to detect all UI elements (buttons, text, icons, etc.)
✅ **Precise Coordinates** - Returns pixel-perfect coordinates for each element  
✅ **Multiple Export Formats** - JSON, CSV, and visualization PNG
✅ **Fast Processing** - ~15 seconds per screenshot on CPU
✅ **Server-Side Storage** - Cropped images stored on server, not sent to clients
✅ **Multiple Endpoints** - Flexible request/response options

## Start the Server

```bash
cd /workspaces/omoi-v2
python ui_element_api_server.py --port 8001
```

Server will start at `http://127.0.0.1:8001`

## API Endpoints

### 1. Health Check
```bash
GET /health
```

**Response:**
```json
{"status": "ok", "service": "UI Element Detection API"}
```

### 2. Analyze Image (Full Response)
```bash
POST /analyze
Content-Type: multipart/form-data

file: <PNG image file>
```

**Response:**
```json
{
  "status": "success",
  "processing_time_seconds": 15.4,
  "timing": {
    "omniparser_seconds": 9.88,
    "template_matching_seconds": 5.48
  },
  "image_info": {
    "filename": "Screenshot.png",
    "size": {"width": 1365, "height": 767}
  },
  "analysis": {
    "total_elements_detected": 120,
    "elements": [
      {
        "template_id": "crop_0000",
        "template_file": "crop_0000.png",
        "confidence": 1.0,
        "bbox": {
          "x1": 71, "y1": 13, "x2": 161, "y2": 29,
          "width": 90, "height": 16
        },
        "center": {"x": 116, "y": 21},
        "bbox_ratio": {
          "x1": 0.052, "y1": 0.017, "x2": 0.118, "y2": 0.038
        }
      },
      // ... 119 more elements
    ]
  },
  "exports": {
    "csv_data": "Element_ID,Template_File,Confidence,X1,Y1,...\n",
    "visualization_png_base64": "iVBORw0KGgoAAAANSUhEUgAAA..."
  }
}
```

### 3. Analyze Image (Structured Response)
```bash
POST /analyze_batch
Content-Type: multipart/form-data

file: <PNG image file>
```

**Response:**
```json
{
  "metadata": {
    "filename": "Screenshot.png",
    "image_size": {"width": 1365, "height": 767},
    "total_elements_detected": 120,
    "templates_loaded": 120
  },
  "coordinates_json": {
    "source_image": "Screenshot.png",
    "image_size": {"width": 1365, "height": 767},
    "total_elements": 120,
    "elements": [...]
  },
  "csv_data": "Element_ID,Template_File,...\n",
  "visualization_png_base64": "iVBORw0KGgo..."
}
```

## Usage Examples

### Python Client

```python
from ui_element_client import UIElementDetectionClient

# Initialize client
client = UIElementDetectionClient(api_url="http://127.0.0.1:8001")

# Check API health
status = client.health_check()
print(status)

# Analyze image and get all elements
result = client.analyze_image("screenshot.png")
print(f"Found {result['analysis']['total_elements_detected']} UI elements")

# Get specific element
element = client.get_element_by_id("screenshot.png", "crop_0031")
print(f"Element at: ({element['center']['x']}, {element['center']['y']})")

# Find elements in a region (top 100 pixels)
elements = client.find_elements_in_region("screenshot.png", 0, 0, 1365, 100)
print(f"Found {len(elements)} elements in top region")
```

### Using curl

#### Analyze image and save outputs
```bash
curl -X POST -F "file=@screenshot.png" http://127.0.0.1:8001/analyze > response.json

# Extract CSV data
python -c "import json; d=json.load(open('response.json')); print(d['exports']['csv_data'])" > coordinates.csv

# Extract visualization (base64 decode)
python -c "
import json, base64
d = json.load(open('response.json'))
with open('visualization.png', 'wb') as f:
    f.write(base64.b64decode(d['exports']['visualization_png_base64']))
"
```

### JavaScript/Node.js

```javascript
const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');

async function analyzeImage(imagePath) {
  const formData = new FormData();
  formData.append('file', fs.createReadStream(imagePath));
  
  const response = await axios.post(
    'http://127.0.0.1:8001/analyze',
    formData,
    { headers: formData.getHeaders() }
  );
  
  const data = response.data;
  console.log(`Found ${data.analysis.total_elements_detected} UI elements`);
  
  // Save CSV
  fs.writeFileSync('coordinates.csv', data.exports.csv_data);
  
  // Save visualization
  const vizBuffer = Buffer.from(data.exports.visualization_png_base64, 'base64');
  fs.writeFileSync('visualization.png', vizBuffer);
  
  return data;
}

analyzeImage('screenshot.png').catch(console.error);
```

## Response Data Structure

Each UI element contains:

```json
{
  "template_id": "crop_0031",           // Element identifier
  "template_file": "crop_0031.png",     // Source template file
  "confidence": 1.0,                     // Matching confidence (0-1)
  "bbox": {
    "x1": 587,                           // Top-left X
    "y1": 393,                           // Top-left Y
    "x2": 763,                           // Bottom-right X
    "y2": 441,                           // Bottom-right Y
    "width": 176,                        // Element width
    "height": 48                         // Element height
  },
  "center": {
    "x": 675,                            // Center X (for clicking)
    "y": 417                             // Center Y (for clicking)
  },
  "bbox_ratio": {
    "x1": 0.430,                         // Normalized X1 (0-1)
    "y1": 0.512,                         // Normalized Y1 (0-1)
    "x2": 0.559,                         // Normalized X2 (0-1)
    "y2": 0.575                          // Normalized Y2 (0-1)
  }
}
```

## Export Formats

### JSON
Complete structured data with all coordinates, confidence scores, and metadata.

### CSV
Spreadsheet-friendly format with columns:
- Element_ID
- Template_File
- Confidence
- X1, Y1, X2, Y2 (pixel coordinates)
- Width, Height
- Center_X, Center_Y
- Ratio_X1, Ratio_Y1, Ratio_X2, Ratio_Y2

### Visualization PNG
High-resolution image with:
- Green bounding boxes around each element
- Red center point marker
- Element ID and confidence label for each box

## Server-Side File Storage

The server maintains a temporary cropped images directory:
```
/tmp/omoi_cropped_images/
  ├── crop_0000.png
  ├── crop_0001.png
  ├── crop_0002.png
  └── ... (120+ images)
```

These files are:
- ✅ Used for template matching
- ✅ Kept on server for reference
- ❌ NOT sent to clients
- ❌ Cleared on server restart

## Performance

Typical performance on CPU:
- OmniParser detection: ~10 seconds
- Template matching: ~5 seconds
- Total: ~15 seconds per screenshot

## Architecture

```
Client Request (PNG)
        ↓
[API Server]
  1. Receives PNG
  2. Runs OmniParser
     ├─ Detects UI elements
     └─ Saves cropped images (server-side only)
  3. Template matches crops back to original
  4. Generates coordinates
  5. Creates visualization
  6. Exports to JSON/CSV
        ↓
Client Response (JSON, CSV, PNG)
  - Coordinates metadata
  - CSV data
  - Visualization image
  (NO cropped images to client)
```

## Coordinate Systems

### Absolute Coordinates
Pixel coordinates in the original image:
- `bbox.x1, bbox.y1`: Top-left corner
- `bbox.x2, bbox.y2`: Bottom-right corner
- `center.x, center.y`: Center point (use for mouse clicks)

### Normalized Coordinates
0-1 scale for responsive designs:
- `bbox_ratio.x1, bbox_ratio.y1`: Top-left (normalized)
- `bbox_ratio.x2, bbox_ratio.y2`: Bottom-right (normalized)
- Useful for scaling to different screen sizes

## Tips

1. **Clicking Elements**: Use `center.x` and `center.y` for mouse position
2. **Validation**: All elements have `confidence: 1.0` (perfect match)
3. **Filtering**: Use `bbox_ratio` for responsive element filtering
4. **Region Queries**: Client library supports finding elements in bounding boxes
5. **Batch Processing**: Queue multiple images for analysis

## Troubleshooting

**"OmniParser not initialized"** - Server failed to load models, check logs

**"Failed to decode image"** - Ensure you're uploading valid PNG/JPG files

**"Cropped images directory not found"** - OmniParser detection failed, check input image

**Timeout** - Processing large images takes time, increase request timeout

## Files

- `ui_element_api_server.py` - Main API server
- `ui_element_client.py` - Python client library
- `ui_element_locator.py` - Template matching utility
- `ui_element_analyzer.py` - Analysis and export utilities

---

**Status**: ✅ Production Ready
**Last Updated**: April 17, 2026