Spaces:

makeitfr
/

omoi-ui-detector

Paused

App Files Files Community

omoi-ui-detector / API_DOCUMENTATION.md

makeitfr

Upload API_DOCUMENTATION.md with huggingface_hub

b4cc14b verified 3 months ago

preview code

Raw

History Blame Contribute Delete

8.64 kB

	# UI Element Detection API

	Complete server-based solution for detecting and locating all UI elements in screenshots using OmniParser and template matching.

	## Features

	✅ Automatic UI Detection - Uses OmniParser to detect all UI elements (buttons, text, icons, etc.)
	✅ Precise Coordinates - Returns pixel-perfect coordinates for each element
	✅ Multiple Export Formats - JSON, CSV, and visualization PNG
	✅ Fast Processing - ~15 seconds per screenshot on CPU
	✅ Server-Side Storage - Cropped images stored on server, not sent to clients
	✅ Multiple Endpoints - Flexible request/response options

	## Start the Server

	```bash
	cd /workspaces/omoi-v2
	python ui_element_api_server.py --port 8001
	```

	Server will start at `http://127.0.0.1:8001`

	## API Endpoints

	### 1. Health Check
	```bash
	GET /health
	```

	Response:
	```json
	{"status": "ok", "service": "UI Element Detection API"}
	```

	### 2. Analyze Image (Full Response)
	```bash
	POST /analyze
	Content-Type: multipart/form-data

	file: <PNG image file>
	```

	Response:
	```json
	{
	"status": "success",
	"processing_time_seconds": 15.4,
	"timing": {
	"omniparser_seconds": 9.88,
	"template_matching_seconds": 5.48
	},
	"image_info": {
	"filename": "Screenshot.png",
	"size": {"width": 1365, "height": 767}
	},
	"analysis": {
	"total_elements_detected": 120,
	"elements": [
	{
	"template_id": "crop_0000",
	"template_file": "crop_0000.png",
	"confidence": 1.0,
	"bbox": {
	"x1": 71, "y1": 13, "x2": 161, "y2": 29,
	"width": 90, "height": 16
	},
	"center": {"x": 116, "y": 21},
	"bbox_ratio": {
	"x1": 0.052, "y1": 0.017, "x2": 0.118, "y2": 0.038
	}
	},
	// ... 119 more elements
	]
	},
	"exports": {
	"csv_data": "Element_ID,Template_File,Confidence,X1,Y1,...\n",
	"visualization_png_base64": "iVBORw0KGgoAAAANSUhEUgAAA..."
	}
	}
	```

	### 3. Analyze Image (Structured Response)
	```bash
	POST /analyze_batch
	Content-Type: multipart/form-data

	file: <PNG image file>
	```

	Response:
	```json
	{
	"metadata": {
	"filename": "Screenshot.png",
	"image_size": {"width": 1365, "height": 767},
	"total_elements_detected": 120,
	"templates_loaded": 120
	},
	"coordinates_json": {
	"source_image": "Screenshot.png",
	"image_size": {"width": 1365, "height": 767},
	"total_elements": 120,
	"elements": [...]
	},
	"csv_data": "Element_ID,Template_File,...\n",
	"visualization_png_base64": "iVBORw0KGgo..."
	}
	```

	## Usage Examples

	### Python Client

	```python
	from ui_element_client import UIElementDetectionClient

	# Initialize client
	client = UIElementDetectionClient(api_url="http://127.0.0.1:8001")

	# Check API health
	status = client.health_check()
	print(status)

	# Analyze image and get all elements
	result = client.analyze_image("screenshot.png")
	print(f"Found {result['analysis']['total_elements_detected']} UI elements")

	# Get specific element
	element = client.get_element_by_id("screenshot.png", "crop_0031")
	print(f"Element at: ({element['center']['x']}, {element['center']['y']})")

	# Find elements in a region (top 100 pixels)
	elements = client.find_elements_in_region("screenshot.png", 0, 0, 1365, 100)
	print(f"Found {len(elements)} elements in top region")
	```

	### Using curl

	#### Analyze image and save outputs
	```bash
	curl -X POST -F "file=@screenshot.png" http://127.0.0.1:8001/analyze > response.json

	# Extract CSV data
	python -c "import json; d=json.load(open('response.json')); print(d['exports']['csv_data'])" > coordinates.csv

	# Extract visualization (base64 decode)
	python -c "
	import json, base64
	d = json.load(open('response.json'))
	with open('visualization.png', 'wb') as f:
	f.write(base64.b64decode(d['exports']['visualization_png_base64']))
	"
	```

	### JavaScript/Node.js

	```javascript
	const FormData = require('form-data');
	const fs = require('fs');
	const axios = require('axios');

	async function analyzeImage(imagePath) {
	const formData = new FormData();
	formData.append('file', fs.createReadStream(imagePath));

	const response = await axios.post(
	'http://127.0.0.1:8001/analyze',
	formData,
	{ headers: formData.getHeaders() }
	);

	const data = response.data;
	console.log(`Found ${data.analysis.total_elements_detected} UI elements`);

	// Save CSV
	fs.writeFileSync('coordinates.csv', data.exports.csv_data);

	// Save visualization
	const vizBuffer = Buffer.from(data.exports.visualization_png_base64, 'base64');
	fs.writeFileSync('visualization.png', vizBuffer);

	return data;
	}

	analyzeImage('screenshot.png').catch(console.error);
	```

	## Response Data Structure

	Each UI element contains:

	```json
	{
	"template_id": "crop_0031", // Element identifier
	"template_file": "crop_0031.png", // Source template file
	"confidence": 1.0, // Matching confidence (0-1)
	"bbox": {
	"x1": 587, // Top-left X
	"y1": 393, // Top-left Y
	"x2": 763, // Bottom-right X
	"y2": 441, // Bottom-right Y
	"width": 176, // Element width
	"height": 48 // Element height
	},
	"center": {
	"x": 675, // Center X (for clicking)
	"y": 417 // Center Y (for clicking)
	},
	"bbox_ratio": {
	"x1": 0.430, // Normalized X1 (0-1)
	"y1": 0.512, // Normalized Y1 (0-1)
	"x2": 0.559, // Normalized X2 (0-1)
	"y2": 0.575 // Normalized Y2 (0-1)
	}
	}
	```

	## Export Formats

	### JSON
	Complete structured data with all coordinates, confidence scores, and metadata.

	### CSV
	Spreadsheet-friendly format with columns:
	- Element_ID
	- Template_File
	- Confidence
	- X1, Y1, X2, Y2 (pixel coordinates)
	- Width, Height
	- Center_X, Center_Y
	- Ratio_X1, Ratio_Y1, Ratio_X2, Ratio_Y2

	### Visualization PNG
	High-resolution image with:
	- Green bounding boxes around each element
	- Red center point marker
	- Element ID and confidence label for each box

	## Server-Side File Storage

	The server maintains a temporary cropped images directory:
	```
	/tmp/omoi_cropped_images/
	├── crop_0000.png
	├── crop_0001.png
	├── crop_0002.png
	└── ... (120+ images)
	```

	These files are:
	- ✅ Used for template matching
	- ✅ Kept on server for reference
	- ❌ NOT sent to clients
	- ❌ Cleared on server restart

	## Performance

	Typical performance on CPU:
	- OmniParser detection: ~10 seconds
	- Template matching: ~5 seconds
	- Total: ~15 seconds per screenshot

	## Architecture

	```
	Client Request (PNG)
	↓
	[API Server]
	1. Receives PNG
	2. Runs OmniParser
	├─ Detects UI elements
	└─ Saves cropped images (server-side only)
	3. Template matches crops back to original
	4. Generates coordinates
	5. Creates visualization
	6. Exports to JSON/CSV
	↓
	Client Response (JSON, CSV, PNG)
	- Coordinates metadata
	- CSV data
	- Visualization image
	(NO cropped images to client)
	```

	## Coordinate Systems

	### Absolute Coordinates
	Pixel coordinates in the original image:
	- `bbox.x1, bbox.y1`: Top-left corner
	- `bbox.x2, bbox.y2`: Bottom-right corner
	- `center.x, center.y`: Center point (use for mouse clicks)

	### Normalized Coordinates
	0-1 scale for responsive designs:
	- `bbox_ratio.x1, bbox_ratio.y1`: Top-left (normalized)
	- `bbox_ratio.x2, bbox_ratio.y2`: Bottom-right (normalized)
	- Useful for scaling to different screen sizes

	## Tips

	1. Clicking Elements: Use `center.x` and `center.y` for mouse position
	2. Validation: All elements have `confidence: 1.0` (perfect match)
	3. Filtering: Use `bbox_ratio` for responsive element filtering
	4. Region Queries: Client library supports finding elements in bounding boxes
	5. Batch Processing: Queue multiple images for analysis

	## Troubleshooting

	"OmniParser not initialized" - Server failed to load models, check logs

	"Failed to decode image" - Ensure you're uploading valid PNG/JPG files

	"Cropped images directory not found" - OmniParser detection failed, check input image

	Timeout - Processing large images takes time, increase request timeout

	## Files

	- `ui_element_api_server.py` - Main API server
	- `ui_element_client.py` - Python client library
	- `ui_element_locator.py` - Template matching utility
	- `ui_element_analyzer.py` - Analysis and export utilities

	---

	Status: ✅ Production Ready
	Last Updated: April 17, 2026