Waldo Detection - Tiled YOLOv8 Models

Pre-trained YOLOv8s models for detecting Waldo in "Where's Waldo?" puzzle images using a tiled inference approach.

Model Description

These models achieve 93.1% detection rate on Where's Waldo puzzles by using a tiled training and inference approach. The key insight is that Waldo occupies only ~2% of typical puzzle images, which is too small for standard detectors. By decomposing images into overlapping tiles, we make Waldo 7-16% of each tile, enabling effective detection.

Key Results:

  • 4.5x improvement over full-image fine-tuning (20.7% → 93.1%)
  • 0.43 seconds per image on Apple M4 Max
  • 0.756 average IoU localization accuracy

Available Models

Model Tile Size Detection Rate Avg IoU Recommended Use
waldo_yolov8s_tile256.pt 256px 91.4% 0.770 Maximum accuracy
waldo_yolov8s_tile384.pt 384px 79.3% 0.601 -
waldo_yolov8s_tile512.pt 512px 87.9% 0.742 Best balance (recommended)
waldo_yolov8s_tile640.pt 640px 77.6% 0.610 -
waldo_yolov8s_tile768.pt 768px 79.3% 0.611 Maximum speed

Quick Start

Installation

pip install ultralytics huggingface_hub pillow

Download and Use

from huggingface_hub import hf_hub_download
from ultralytics import YOLO
from PIL import Image

# Download the recommended model
model_path = hf_hub_download(
    repo_id="YOUR_USERNAME/waldo-yolov8-tiled",
    filename="waldo_yolov8s_tile512.pt"
)

# Load model
model = YOLO(model_path)

Tiled Inference (Recommended)

For best results on full Where's Waldo images, use tiled inference:

from ultralytics import YOLO
from PIL import Image
from huggingface_hub import hf_hub_download

def find_waldo(image_path, model, tile_size=512, overlap=0.25, conf=0.1):
    """
    Find Waldo using tiled inference.

    Args:
        image_path: Path to Where's Waldo image
        model: Loaded YOLO model
        tile_size: Size of tiles (must match model training size)
        overlap: Overlap between tiles (0.0-0.5)
        conf: Confidence threshold

    Returns:
        List of detections with coordinates and confidence
    """
    img = Image.open(image_path).convert('RGB')
    img_w, img_h = img.size

    stride = int(tile_size * (1 - overlap))
    all_detections = []

    # Scan image with overlapping tiles
    for y in range(0, img_h - tile_size + 1, stride):
        for x in range(0, img_w - tile_size + 1, stride):
            tile = img.crop((x, y, x + tile_size, y + tile_size))
            results = model.predict(tile, conf=conf, verbose=False, imgsz=tile_size)

            for result in results:
                for box in result.boxes:
                    bx1, by1, bx2, by2 = box.xyxy[0].cpu().numpy()
                    all_detections.append({
                        'x1': float(x + bx1),
                        'y1': float(y + by1),
                        'x2': float(x + bx2),
                        'y2': float(y + by2),
                        'confidence': float(box.conf[0])
                    })

    # Simple NMS to merge overlapping detections
    if not all_detections:
        return []

    all_detections.sort(key=lambda d: d['confidence'], reverse=True)
    keep = []

    while all_detections:
        best = all_detections.pop(0)
        keep.append(best)
        all_detections = [d for d in all_detections if iou(best, d) < 0.5]

    return keep

def iou(box1, box2):
    """Compute IoU between two boxes."""
    ix1 = max(box1['x1'], box2['x1'])
    iy1 = max(box1['y1'], box2['y1'])
    ix2 = min(box1['x2'], box2['x2'])
    iy2 = min(box1['y2'], box2['y2'])

    inter = max(0, ix2 - ix1) * max(0, iy2 - iy1)
    area1 = (box1['x2'] - box1['x1']) * (box1['y2'] - box1['y1'])
    area2 = (box2['x2'] - box2['x1']) * (box2['y2'] - box2['y1'])

    return inter / (area1 + area2 - inter) if (area1 + area2 - inter) > 0 else 0

# Example usage
model_path = hf_hub_download(
    repo_id="YOUR_USERNAME/waldo-yolov8-tiled",
    filename="waldo_yolov8s_tile512.pt"
)
model = YOLO(model_path)

detections = find_waldo("waldo_puzzle.jpg", model)
if detections:
    best = detections[0]
    print(f"Found Waldo at ({best['x1']:.0f}, {best['y1']:.0f}) with {best['confidence']:.1%} confidence")

Full Code Repository

For complete training, evaluation, and ablation study code, see our GitHub repository:

GitHub: https://github.com/YOUR_USERNAME/waldo-detection

Training Details

  • Architecture: YOLOv8s (11M parameters)
  • Training Data: Where's Waldo dataset from Roboflow (58 images → ~2000 tiles)
  • Tile Size: Varies by model (256-768px)
  • Overlap: 25% during training
  • Epochs: 100 with early stopping (patience=20)
  • Augmentation: Reduced mosaic (0.5), scale (±30%), no mixup (optimized for small objects)
  • Hardware: Apple M4 Max with MPS acceleration

Why Tiling Works

Setting Waldo Size Detection Rate
Full image (1800px) ~2% of image 20.7%
512px tile 7-16% of tile 87.9%
256px tile 15-31% of tile 91.4%

Tiling increases Waldo's relative size, providing more features for the detector to learn from.

Ablation Studies

Tile Size (with 25% overlap)

Tile Size Detection Rate Avg IoU Time/Image
256px 91.4% 0.770 1.15s
512px 87.9% 0.742 0.67s
768px 79.3% 0.611 0.42s

Overlap (with 512px tiles)

Overlap Detection Rate Avg IoU Time/Image
0% 93.1% 0.718 0.43s
12.5% 93.1% 0.755 0.43s
25% 93.1% 0.756 0.67s

Recommendation: Use 512px tiles with 12.5% overlap for the best speed-accuracy tradeoff.

Citation

@article{ramakrishnan2026finding,
  title={Finding Waldo in 0.43 Seconds: Tiled YOLOv8 for Small Object Detection in Dense Scenes},
  author={Ramakrishnan, Siddharth},
  journal={arXiv preprint},
  year={2026}
}

License

MIT License - see LICENSE file.

Acknowledgments

Downloads last month
114
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support