rf-detr-mobile-gui-detection

Mobile GUI grounding model built on top of roboflow/rf-detr-medium

Object Detection DETR Mobile GUI Grounding

rf-detr-mobile-gui-detection is a mobile gui grounding model built on top of roboflow/rf-detr-medium using the rfdetrforobjectdetection architecture. rf-detr is an end-to-end object detection model that combines ideas from lw-detr and deformable detr: a dinov2-with-registers-style vit backbone, an rf-detr windowing pattern for efficient attention, a multi-scale projector between the encoder and decoder, and a multi-scale deformable detr decoder for fast convergence and strong accuracy-latency tradeoffs.

Note

rf-detr: neural architecture search for real-time detection transformers: https://huggingface.co/papers/2511.09554

Metrics Loss Map

Per Class Metrics

Quick Start with Transformers

pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/cu128
pip install torchvision==0.23.0 transformers==5.9.0 accelerate gradio==6.19.0

import gradio as gr
import torch
from PIL import Image, ImageDraw

from transformers import AutoImageProcessor, RfDetrForObjectDetection

# Load model and processor
model_name = "prithivMLmods/rf-detr-mobile-gui-detection"

processor = AutoImageProcessor.from_pretrained(model_name)
model = RfDetrForObjectDetection.from_pretrained(model_name)

# Detection threshold
THRESHOLD = 0.35


def detect_gui(image):
    image = Image.fromarray(image).convert("RGB")

    inputs = processor(images=image, return_tensors="pt")

    with torch.no_grad():
        outputs = model(**inputs)

    target_sizes = torch.tensor([image.size[::-1]])
    results = processor.post_process_object_detection(
        outputs,
        target_sizes=target_sizes,
        threshold=THRESHOLD,
    )[0]

    draw = ImageDraw.Draw(image)

    detections = []

    for score, label, box in zip(
        results["scores"],
        results["labels"],
        results["boxes"],
    ):
        box = [round(x, 2) for x in box.tolist()]
        label_name = model.config.id2label[label.item()]
        confidence = round(score.item(), 3)

        # Draw bounding box
        draw.rectangle(box, outline="red", width=3)

        # Draw label
        draw.text(
            (box[0] + 4, max(0, box[1] - 16)),
            f"{label_name} {confidence:.2f}",
            fill="red",
        )

        detections.append(
            {
                "Label": label_name,
                "Confidence": confidence,
                "Bounding Box": box,
            }
        )

    return image, detections


demo = gr.Interface(
    fn=detect_gui,
    inputs=gr.Image(type="numpy", label="Upload Mobile UI Screenshot"),
    outputs=[
        gr.Image(type="pil", label="Detected GUI Elements"),
        gr.JSON(label="Detections"),
    ],
    title="RF-DETR Mobile GUI Detection",
    description="Upload a mobile UI screenshot to detect GUI elements using RF-DETR.",
)

if __name__ == "__main__":
    demo.launch()

e.g., demo screenshot

Acknowledgements

roboflow/rf-detr-medium: rf-detr is an end-to-end object detection model that combines ideas from lw-detr and deformable detr: a dinov2-with-registers-style vit backbone (with an rf-detr windowing pattern for efficient attention), a multi-scale projector between the encoder and decoder, and a multi-scale deformable detr decoder for fast convergence and strong accuracy-latency tradeoffs.
mobile ui design detection[dataset] by mrtoy: this dataset is designed for object detection tasks focused on detecting elements in mobile ui designs. the target objects include text, images, and groups. the dataset contains mobile ui images with object detection bounding boxes, class labels, and localization information.

Downloads last month: -

Safetensors

Model size

33.4M params

Tensor type

F32

Model tree for prithivMLmods/rf-detr-mobile-gui-detection

Base model

Roboflow/rf-detr-medium

Finetuned

(8)

this model

Dataset used to train prithivMLmods/rf-detr-mobile-gui-detection

Paper for prithivMLmods/rf-detr-mobile-gui-detection

RF-DETR: Neural Architecture Search for Real-Time Detection Transformers

Paper • 2511.09554 • Published Nov 12, 2025 • 13