Spaces:

AI-DrivenTesting
/

CU1-X

Sleeping

App Files Files Community

CU1-X / README.md

AI-DrivenTesting

Update README.md

b0a5932 verified 14 days ago

preview code

raw

history blame contribute delete

13.9 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

metadata

title: CU1-Xtended
emoji: 🧠
colorFrom: pink
colorTo: purple
sdk: gradio
sdk_version: 4.36.0
app_file: app.py
pinned: false

CU-1 UI Element Detector

Detect and classify UI elements in screenshots using a multi-model AI pipeline.

🏗️ Architecture

CU-1 uses a service-oriented architecture with clear separation of concerns:

┌─────────────────────────────────────────────────────────────┐
│                    APPLICATION LAYER                        │
├─────────────────────────────────────────────────────────────┤
│  app_api.py          │  app_ui.py                           │
│  API Server Entry    │  Gradio UI Entry                     │
└─────────────┬────────┴──────────┬──────────────────────────┘
              │                   │
              │                   │ HTTP/REST
              │                   │ (requests library)
              │                   │
┌─────────────▼───────┐  ┌────────▼─────────────────────────┐
│   API LAYER         │  │   UI LAYER                        │
├─────────────────────┤  ├───────────────────────────────────┤
│  api/endpoints.py   │  │  ui/gradio_interface.py           │
│  - Thin HTTP layer  │  │  - Gradio web interface           │
│  - Request validation│  │  - Calls API via HTTP            │
│  - No business logic│  │  - Displays results               │
└─────────────┬───────┘  └───────────────────────────────────┘
              │
              │ Direct import
              │
┌─────────────▼──────────────────────────────────────────────┐
│                  DETECTION LAYER                            │
│                  (Business Logic)                           │
├─────────────────────────────────────────────────────────────┤
│  detection/service.py       │  Main detection service       │
│  detection/ocr_handler.py   │  OCR-only processing          │
│  detection/response_builder.py │ Response formatting        │
└─────────────────────────────────────────────────────────────┘

Multi-Model Pipeline

CU-1 combines 4 AI models in a sophisticated pipeline:

RF-DETR (Detection Transformer)
- Detects generic "UI elements" as a SINGLE CLASS
- Provides bounding boxes and confidence scores
- Does NOT distinguish between button, input, text, etc.
CLIP (OpenAI)
- OPTIONAL multi-class classification
- Takes RF-DETR detections and classifies them into 6 types:
  - button - Buttons, FABs, chips, switches
  - input - Text fields, search bars
  - text - Labels, titles, paragraphs
  - image - Images, icons, avatars
  - list_item - List items, cards, tiles
  - navigation - Navigation bars, tabs, menus
EasyOCR
- Extracts text content from detected regions
- Runs global OCR merge to catch text outside detection boxes
BLIP (Salesforce)
- OPTIONAL visual description generation
- Describes icons and images when text is not present

🚀 Quick Start

Installation

# Clone the repository
git clone <repository-url>
cd CU1X

# Install dependencies
pip install -r requirements.txt

Running the Application

📖 NEW: Architecture unified! All modes now use the API layer for consistency. See START.md for detailed guide.

Option 1: One-Command Launch (Recommended for Testing)

Automatically starts both API server and Gradio UI:

python app.py

What happens:

✅ Starts API server in background (port 8000)
✅ Waits for API to be ready
✅ Starts Gradio UI (port 7860)
✅ Handles clean shutdown with Ctrl+C

Access:

Gradio UI: http://localhost:7860
API Docs: http://localhost:8000/docs

Option 2: Manual Launch (2 Terminals)

For more control and debugging:

# Terminal 1: Start API server
python app_api.py

# Terminal 2: Start Gradio UI
python app_ui.py

Access:

Option 3: API Only

For API-only usage (scripts, integrations):

python app_api.py

Then use the REST API programmatically (see examples below).

📡 API Usage

Python Example

import requests

# Detect UI elements
with open("screenshot.png", "rb") as f:
    response = requests.post(
        "http://localhost:8000/detect",
        files={"image": f},
        data={
            "confidence_threshold": 0.35,
            "enable_clip": True,
            "enable_ocr": True,
            "enable_blip": False
        }
    )

results = response.json()
print(f"Found {results['total_detections']} elements")

for detection in results['detections']:
    print(f"- {detection['class_name']}: {detection.get('text', 'N/A')}")

cURL Example

curl -X POST "http://localhost:8000/detect" \
  -F "image=@screenshot.png" \
  -F "confidence_threshold=0.35" \
  -F "enable_clip=true" \
  -F "enable_ocr=true"

Response Format

{
  "success": true,
  "detections": [
    {
      "box": {"x1": 50, "y1": 100, "x2": 200, "y2": 150},
      "confidence": 0.79,
      "class_id": 0,
      "class_name": "button",
      "text": "Submit",
      "description": ""
    }
  ],
  "total_detections": 1,
  "image_size": {"width": 1080, "height": 1920},
  "parameters": {
    "confidence_threshold": 0.35,
    "enable_clip": true,
    "enable_ocr": true,
    "enable_blip": false
  },
  "type_distribution": {"button": 5, "text": 12},
  "annotated_image": {
    "mime": "image/png",
    "base64": "iVBORw0KGgoAAAANSU..."
  }
}

🐍 Python Library Usage

You can also use CU-1 as a Python library:

from detection.service import DetectionService

# Initialize detector
detector = DetectionService(
    enable_clip=True,
    enable_ocr=True,
    enable_blip=False
)

# Analyze image
results = detector.analyze(
    "screenshot.png",
    confidence_threshold=0.35,
    use_clip=True,
    use_blip=False
)

# Access detections
for detection in results['detections']:
    box = detection['box']
    print(f"{detection['class_name']}: {detection['text']}")
    print(f"  Location: ({box['x1']}, {box['y1']}) to ({box['x2']}, {box['y2']})")

🎯 Detection Modes

1. Full Detection Mode (Default)

Uses RF-DETR to detect elements, optionally classifies with CLIP, extracts text with OCR.

data = {
    "confidence_threshold": 0.35,
    "enable_clip": True,   # Classify element types
    "enable_ocr": True,    # Extract text
    "enable_blip": False
}

2. OCR-Only Mode

Bypasses RF-DETR and runs OCR directly across the entire image.

data = {
    "ocr_only": True,
    "enable_clip": False,  # Must be false
    "enable_blip": False   # Must be false
}

3. Visual Description Mode

Generates descriptions for icons using BLIP.

data = {
    "enable_clip": True,
    "enable_ocr": True,
    "enable_blip": True,
    "blip_scope": "icons"  # or "all"
}

📁 Project Structure

CU1X/
├── app_api.py              # API server entry point
├── app_ui.py               # Gradio UI entry point
├── detection/              # Business logic layer
│   ├── __init__.py
│   ├── service.py          # Main DetectionService
│   ├── ocr_handler.py      # OCR-only processing
│   └── response_builder.py # Response formatting
├── api/                    # HTTP layer (thin)
│   ├── __init__.py
│   └── endpoints.py        # FastAPI endpoints
├── ui/                     # UI layer
│   ├── __init__.py
│   └── gradio_interface.py # Gradio interface (API client)
├── rfdetr/                 # RF-DETR implementation
├── model.pth               # Trained model weights
├── requirements.txt        # Python dependencies
└── README.md

⚙️ Configuration

Environment Variables

API Server:

No configuration needed (runs on port 8000)

Gradio UI:

CU1-X_API_URL: API endpoint (default: http://localhost:8000)
GRADIO_SERVER_NAME: Server host (default: 0.0.0.0)
GRADIO_SERVER_PORT: Server port (default: 7860)
GRADIO_SHARE: Enable Gradio sharing (default: false)

Example:

export CU1_API_URL=http://your-api-server:8000
python app_ui.py

🔍 Detection Parameters

Parameter	Type	Default	Description
`confidence_threshold`	float	0.35	Detection confidence (0.1-0.9)
`enable_clip`	bool	false	Classify element types
`enable_ocr`	bool	true	Extract text content
`enable_blip`	bool	false	Generate visual descriptions
`blip_scope`	str	"icons"	"icons" or "all"
`ocr_only`	bool	false	Skip detection, OCR only

🐛 Bug Fixes in This Version

1. Fixed RF-DETR Single-Class Confusion

Issue: Code suggested RF-DETR did multi-class detection, but it only detects generic "UI elements" (single class).

Fix:

Removed unused base_class_ids variable
Added clear documentation explaining RF-DETR is single-class
CLIP provides the multi-class classification (6 types)

2. Fixed OCR-Only Validation Logic

Issue: API incorrectly rejected enable_ocr=true when ocr_only=true.

Fix:

# OLD (WRONG):
if ocr_only and (enable_clip or enable_blip or enable_ocr):
    raise HTTPException(...)

# NEW (CORRECT):
if ocr_only and (enable_clip or enable_blip):
    raise HTTPException(...)

🏆 Key Architecture Principles

Separation of Concerns: Detection logic, API layer, and UI layer are completely isolated
No Business Logic in API: api/endpoints.py only handles HTTP, delegates to detection/ module
Service-Oriented: Gradio UI is a client of the API (HTTP calls), not direct imports
Single Source of Truth: All detection logic in detection/ module
Testability: Each layer can be tested independently

🚦 Performance

Detection performance depends on enabled features:

Mode	Time	Use Case
RF-DETR only	~25-35s	Just bounding boxes
RF-DETR + OCR	~30-40s	Text extraction
RF-DETR + CLIP + OCR	~50-60s	Full classification + text
RF-DETR + CLIP + OCR + BLIP	~70-90s	Complete analysis

Times are approximate and depend on image size and hardware (CPU vs GPU).

🤗 Deploying to Hugging Face Spaces

🚀 Quick Deploy (2 Commands)

Option 1: Scripts Automatiques (Recommandé)

# 1. Vérifier que tout est prêt
./check_hf_space.sh

# 2. Déployer automatiquement
./deploy_hf_space.sh

Option 2: Manuel

Create a new Space on Hugging Face
- Choose "Gradio" as SDK
- Select hardware (CPU or GPU)

Clone and push:

git clone https://huggingface.co/spaces/YOUR_USERNAME/CU1-X
cd CU1-X
# Copy files from your project
git lfs install
git lfs track "*.pth"
git add .
git commit -m "Initial deployment"
git push origin main

Space will auto-deploy - First run takes 5-10 minutes (model download)

📚 Documentation

QUICK_DEPLOY.md - Guide ultra-rapide (2 commandes)
DEPLOYMENT.md - Guide détaillé complet
API_USAGE.md - Comment utiliser l'API depuis l'extérieur

Unified Architecture

NEW: app.py now uses the same unified API architecture everywhere:

✅ Starts API server in subprocess
✅ Starts Gradio UI that connects to API
✅ Same code path as local development
✅ Consistent behavior across all environments

Benefits:

Single code path to maintain (no special HF Spaces mode)
Same API layer everywhere (easier debugging)
Can scale to separate API/UI servers if needed

🔌 Accessing HF Space via API

Once deployed, your HF Space automatically exposes an API:

# Install Gradio client
pip install gradio_client

# Use your Space
from gradio_client import Client

client = Client("YOUR_USERNAME/cu1-detector")
result = client.predict("screenshot.png", 0.35, 2, True, True, False, False, "Only image & button")

annotated_image, summary, detections = result
print(f"Found {detections['total_detections']} elements!")

See:

examples/simple_hf_api_example.py - Quick start
examples/huggingface_api_usage.py - Full examples (batch, async, etc.)
DEPLOYMENT.md - Complete deployment guide (Docker, AWS, GCP, Azure, etc.)

📝 License

See LICENSE file for details.

🙏 Acknowledgments

RF-DETR: Roboflow
CLIP: OpenAI
BLIP: Salesforce
EasyOCR: JaidedAI

Questions or issues? Please open an issue on GitHub.