Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.1.0
title: CU1-Xtended
emoji: π§
colorFrom: pink
colorTo: purple
sdk: gradio
sdk_version: 4.36.0
app_file: app.py
pinned: false
CU-1 UI Element Detector
Detect and classify UI elements in screenshots using a multi-model AI pipeline.
ποΈ Architecture
CU-1 uses a service-oriented architecture with clear separation of concerns:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β APPLICATION LAYER β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β app_api.py β app_ui.py β
β API Server Entry β Gradio UI Entry β
βββββββββββββββ¬βββββββββ΄βββββββββββ¬βββββββββββββββββββββββββββ
β β
β β HTTP/REST
β β (requests library)
β β
βββββββββββββββΌββββββββ ββββββββββΌββββββββββββββββββββββββββ
β API LAYER β β UI LAYER β
βββββββββββββββββββββββ€ βββββββββββββββββββββββββββββββββββββ€
β api/endpoints.py β β ui/gradio_interface.py β
β - Thin HTTP layer β β - Gradio web interface β
β - Request validationβ β - Calls API via HTTP β
β - No business logicβ β - Displays results β
βββββββββββββββ¬ββββββββ βββββββββββββββββββββββββββββββββββββ
β
β Direct import
β
βββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββ
β DETECTION LAYER β
β (Business Logic) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β detection/service.py β Main detection service β
β detection/ocr_handler.py β OCR-only processing β
β detection/response_builder.py β Response formatting β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Multi-Model Pipeline
CU-1 combines 4 AI models in a sophisticated pipeline:
RF-DETR (Detection Transformer)
- Detects generic "UI elements" as a SINGLE CLASS
- Provides bounding boxes and confidence scores
- Does NOT distinguish between button, input, text, etc.
CLIP (OpenAI)
- OPTIONAL multi-class classification
- Takes RF-DETR detections and classifies them into 6 types:
button- Buttons, FABs, chips, switchesinput- Text fields, search barstext- Labels, titles, paragraphsimage- Images, icons, avatarslist_item- List items, cards, tilesnavigation- Navigation bars, tabs, menus
EasyOCR
- Extracts text content from detected regions
- Runs global OCR merge to catch text outside detection boxes
BLIP (Salesforce)
- OPTIONAL visual description generation
- Describes icons and images when text is not present
π Quick Start
Installation
# Clone the repository
git clone <repository-url>
cd CU1X
# Install dependencies
pip install -r requirements.txt
Running the Application
π NEW: Architecture unified! All modes now use the API layer for consistency. See START.md for detailed guide.
Option 1: One-Command Launch (Recommended for Testing)
Automatically starts both API server and Gradio UI:
python app.py
What happens:
- β Starts API server in background (port 8000)
- β Waits for API to be ready
- β Starts Gradio UI (port 7860)
- β Handles clean shutdown with Ctrl+C
Access:
- Gradio UI: http://localhost:7860
- API Docs: http://localhost:8000/docs
Option 2: Manual Launch (2 Terminals)
For more control and debugging:
# Terminal 1: Start API server
python app_api.py
# Terminal 2: Start Gradio UI
python app_ui.py
Access:
- API: http://localhost:8000
- API Docs: http://localhost:8000/docs
- Gradio UI: http://localhost:7860
Option 3: API Only
For API-only usage (scripts, integrations):
python app_api.py
Then use the REST API programmatically (see examples below).
π‘ API Usage
Python Example
import requests
# Detect UI elements
with open("screenshot.png", "rb") as f:
response = requests.post(
"http://localhost:8000/detect",
files={"image": f},
data={
"confidence_threshold": 0.35,
"enable_clip": True,
"enable_ocr": True,
"enable_blip": False
}
)
results = response.json()
print(f"Found {results['total_detections']} elements")
for detection in results['detections']:
print(f"- {detection['class_name']}: {detection.get('text', 'N/A')}")
cURL Example
curl -X POST "http://localhost:8000/detect" \
-F "image=@screenshot.png" \
-F "confidence_threshold=0.35" \
-F "enable_clip=true" \
-F "enable_ocr=true"
Response Format
{
"success": true,
"detections": [
{
"box": {"x1": 50, "y1": 100, "x2": 200, "y2": 150},
"confidence": 0.79,
"class_id": 0,
"class_name": "button",
"text": "Submit",
"description": ""
}
],
"total_detections": 1,
"image_size": {"width": 1080, "height": 1920},
"parameters": {
"confidence_threshold": 0.35,
"enable_clip": true,
"enable_ocr": true,
"enable_blip": false
},
"type_distribution": {"button": 5, "text": 12},
"annotated_image": {
"mime": "image/png",
"base64": "iVBORw0KGgoAAAANSU..."
}
}
π Python Library Usage
You can also use CU-1 as a Python library:
from detection.service import DetectionService
# Initialize detector
detector = DetectionService(
enable_clip=True,
enable_ocr=True,
enable_blip=False
)
# Analyze image
results = detector.analyze(
"screenshot.png",
confidence_threshold=0.35,
use_clip=True,
use_blip=False
)
# Access detections
for detection in results['detections']:
box = detection['box']
print(f"{detection['class_name']}: {detection['text']}")
print(f" Location: ({box['x1']}, {box['y1']}) to ({box['x2']}, {box['y2']})")
π― Detection Modes
1. Full Detection Mode (Default)
Uses RF-DETR to detect elements, optionally classifies with CLIP, extracts text with OCR.
data = {
"confidence_threshold": 0.35,
"enable_clip": True, # Classify element types
"enable_ocr": True, # Extract text
"enable_blip": False
}
2. OCR-Only Mode
Bypasses RF-DETR and runs OCR directly across the entire image.
data = {
"ocr_only": True,
"enable_clip": False, # Must be false
"enable_blip": False # Must be false
}
3. Visual Description Mode
Generates descriptions for icons using BLIP.
data = {
"enable_clip": True,
"enable_ocr": True,
"enable_blip": True,
"blip_scope": "icons" # or "all"
}
π Project Structure
CU1X/
βββ app_api.py # API server entry point
βββ app_ui.py # Gradio UI entry point
βββ detection/ # Business logic layer
β βββ __init__.py
β βββ service.py # Main DetectionService
β βββ ocr_handler.py # OCR-only processing
β βββ response_builder.py # Response formatting
βββ api/ # HTTP layer (thin)
β βββ __init__.py
β βββ endpoints.py # FastAPI endpoints
βββ ui/ # UI layer
β βββ __init__.py
β βββ gradio_interface.py # Gradio interface (API client)
βββ rfdetr/ # RF-DETR implementation
βββ model.pth # Trained model weights
βββ requirements.txt # Python dependencies
βββ README.md
βοΈ Configuration
Environment Variables
API Server:
- No configuration needed (runs on port 8000)
Gradio UI:
CU1-X_API_URL: API endpoint (default:http://localhost:8000)GRADIO_SERVER_NAME: Server host (default:0.0.0.0)GRADIO_SERVER_PORT: Server port (default:7860)GRADIO_SHARE: Enable Gradio sharing (default:false)
Example:
export CU1_API_URL=http://your-api-server:8000
python app_ui.py
π Detection Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
confidence_threshold |
float | 0.35 | Detection confidence (0.1-0.9) |
enable_clip |
bool | false | Classify element types |
enable_ocr |
bool | true | Extract text content |
enable_blip |
bool | false | Generate visual descriptions |
blip_scope |
str | "icons" | "icons" or "all" |
ocr_only |
bool | false | Skip detection, OCR only |
π Bug Fixes in This Version
1. Fixed RF-DETR Single-Class Confusion
Issue: Code suggested RF-DETR did multi-class detection, but it only detects generic "UI elements" (single class).
Fix:
- Removed unused
base_class_idsvariable - Added clear documentation explaining RF-DETR is single-class
- CLIP provides the multi-class classification (6 types)
2. Fixed OCR-Only Validation Logic
Issue: API incorrectly rejected enable_ocr=true when ocr_only=true.
Fix:
# OLD (WRONG):
if ocr_only and (enable_clip or enable_blip or enable_ocr):
raise HTTPException(...)
# NEW (CORRECT):
if ocr_only and (enable_clip or enable_blip):
raise HTTPException(...)
π Key Architecture Principles
- Separation of Concerns: Detection logic, API layer, and UI layer are completely isolated
- No Business Logic in API:
api/endpoints.pyonly handles HTTP, delegates todetection/module - Service-Oriented: Gradio UI is a client of the API (HTTP calls), not direct imports
- Single Source of Truth: All detection logic in
detection/module - Testability: Each layer can be tested independently
π¦ Performance
Detection performance depends on enabled features:
| Mode | Time | Use Case |
|---|---|---|
| RF-DETR only | ~25-35s | Just bounding boxes |
| RF-DETR + OCR | ~30-40s | Text extraction |
| RF-DETR + CLIP + OCR | ~50-60s | Full classification + text |
| RF-DETR + CLIP + OCR + BLIP | ~70-90s | Complete analysis |
Times are approximate and depend on image size and hardware (CPU vs GPU).
π€ Deploying to Hugging Face Spaces
π Quick Deploy (2 Commands)
Option 1: Scripts Automatiques (RecommandΓ©)
# 1. VΓ©rifier que tout est prΓͺt
./check_hf_space.sh
# 2. DΓ©ployer automatiquement
./deploy_hf_space.sh
Option 2: Manuel
Create a new Space on Hugging Face
- Choose "Gradio" as SDK
- Select hardware (CPU or GPU)
Clone and push:
git clone https://huggingface.co/spaces/YOUR_USERNAME/CU1-X cd CU1-X # Copy files from your project git lfs install git lfs track "*.pth" git add . git commit -m "Initial deployment" git push origin mainSpace will auto-deploy - First run takes 5-10 minutes (model download)
π Documentation
- QUICK_DEPLOY.md - Guide ultra-rapide (2 commandes)
- DEPLOYMENT.md - Guide dΓ©taillΓ© complet
- API_USAGE.md - Comment utiliser l'API depuis l'extΓ©rieur
Unified Architecture
NEW: app.py now uses the same unified API architecture everywhere:
- β Starts API server in subprocess
- β Starts Gradio UI that connects to API
- β Same code path as local development
- β Consistent behavior across all environments
Benefits:
- Single code path to maintain (no special HF Spaces mode)
- Same API layer everywhere (easier debugging)
- Can scale to separate API/UI servers if needed
π Accessing HF Space via API
Once deployed, your HF Space automatically exposes an API:
# Install Gradio client
pip install gradio_client
# Use your Space
from gradio_client import Client
client = Client("YOUR_USERNAME/cu1-detector")
result = client.predict("screenshot.png", 0.35, 2, True, True, False, False, "Only image & button")
annotated_image, summary, detections = result
print(f"Found {detections['total_detections']} elements!")
See:
examples/simple_hf_api_example.py- Quick startexamples/huggingface_api_usage.py- Full examples (batch, async, etc.)- DEPLOYMENT.md - Complete deployment guide (Docker, AWS, GCP, Azure, etc.)
π License
See LICENSE file for details.
π Acknowledgments
- RF-DETR: Roboflow
- CLIP: OpenAI
- BLIP: Salesforce
- EasyOCR: JaidedAI
Questions or issues? Please open an issue on GitHub.