Waldo Detection - Tiled YOLOv8 Models

Pre-trained YOLOv8s models for detecting Waldo in "Where's Waldo?" puzzle images using a tiled inference approach.

Model Description

These models achieve 93.1% detection rate on Where's Waldo puzzles by using a tiled training and inference approach. The key insight is that Waldo occupies only ~2% of typical puzzle images, which is too small for standard detectors. By decomposing images into overlapping tiles, we make Waldo 7-16% of each tile, enabling effective detection.

Key Results:

4.5x improvement over full-image fine-tuning (20.7% → 93.1%)
0.43 seconds per image on Apple M4 Max
0.756 average IoU localization accuracy

Available Models

Model	Tile Size	Detection Rate	Avg IoU	Recommended Use
`waldo_yolov8s_tile256.pt`	256px	91.4%	0.770	Maximum accuracy
`waldo_yolov8s_tile384.pt`	384px	79.3%	0.601	-
`waldo_yolov8s_tile512.pt`	512px	87.9%	0.742	Best balance (recommended)
`waldo_yolov8s_tile640.pt`	640px	77.6%	0.610	-
`waldo_yolov8s_tile768.pt`	768px	79.3%	0.611	Maximum speed

Quick Start

Installation

pip install ultralytics huggingface_hub pillow

Download and Use

from huggingface_hub import hf_hub_download
from ultralytics import YOLO
from PIL import Image

# Download the recommended model
model_path = hf_hub_download(
    repo_id="YOUR_USERNAME/waldo-yolov8-tiled",
    filename="waldo_yolov8s_tile512.pt"
)

# Load model
model = YOLO(model_path)

Tiled Inference (Recommended)

For best results on full Where's Waldo images, use tiled inference:

from ultralytics import YOLO
from PIL import Image
from huggingface_hub import hf_hub_download

def find_waldo(image_path, model, tile_size=512, overlap=0.25, conf=0.1):
    """
    Find Waldo using tiled inference.

    Args:
        image_path: Path to Where's Waldo image
        model: Loaded YOLO model
        tile_size: Size of tiles (must match model training size)
        overlap: Overlap between tiles (0.0-0.5)
        conf: Confidence threshold

    Returns:
        List of detections with coordinates and confidence
    """
    img = Image.open(image_path).convert('RGB')
    img_w, img_h = img.size

    stride = int(tile_size * (1 - overlap))
    all_detections = []

    # Scan image with overlapping tiles
    for y in range(0, img_h - tile_size + 1, stride):
        for x in range(0, img_w - tile_size + 1, stride):
            tile = img.crop((x, y, x + tile_size, y + tile_size))
            results = model.predict(tile, conf=conf, verbose=False, imgsz=tile_size)

            for result in results:
                for box in result.boxes:
                    bx1, by1, bx2, by2 = box.xyxy[0].cpu().numpy()
                    all_detections.append({
                        'x1': float(x + bx1),
                        'y1': float(y + by1),
                        'x2': float(x + bx2),
                        'y2': float(y + by2),
                        'confidence': float(box.conf[0])
                    })

    # Simple NMS to merge overlapping detections
    if not all_detections:
        return []

    all_detections.sort(key=lambda d: d['confidence'], reverse=True)
    keep = []

    while all_detections:
        best = all_detections.pop(0)
        keep.append(best)
        all_detections = [d for d in all_detections if iou(best, d) < 0.5]

    return keep

def iou(box1, box2):
    """Compute IoU between two boxes."""
    ix1 = max(box1['x1'], box2['x1'])
    iy1 = max(box1['y1'], box2['y1'])
    ix2 = min(box1['x2'], box2['x2'])
    iy2 = min(box1['y2'], box2['y2'])

    inter = max(0, ix2 - ix1) * max(0, iy2 - iy1)
    area1 = (box1['x2'] - box1['x1']) * (box1['y2'] - box1['y1'])
    area2 = (box2['x2'] - box2['x1']) * (box2['y2'] - box2['y1'])

    return inter / (area1 + area2 - inter) if (area1 + area2 - inter) > 0 else 0

# Example usage
model_path = hf_hub_download(
    repo_id="YOUR_USERNAME/waldo-yolov8-tiled",
    filename="waldo_yolov8s_tile512.pt"
)
model = YOLO(model_path)

detections = find_waldo("waldo_puzzle.jpg", model)
if detections:
    best = detections[0]
    print(f"Found Waldo at ({best['x1']:.0f}, {best['y1']:.0f}) with {best['confidence']:.1%} confidence")

Full Code Repository

For complete training, evaluation, and ablation study code, see our GitHub repository:

GitHub: https://github.com/YOUR_USERNAME/waldo-detection

Training Details

Architecture: YOLOv8s (11M parameters)
Training Data: Where's Waldo dataset from Roboflow (58 images → ~2000 tiles)
Tile Size: Varies by model (256-768px)
Overlap: 25% during training
Epochs: 100 with early stopping (patience=20)
Augmentation: Reduced mosaic (0.5), scale (±30%), no mixup (optimized for small objects)
Hardware: Apple M4 Max with MPS acceleration

Why Tiling Works

Setting	Waldo Size	Detection Rate
Full image (1800px)	~2% of image	20.7%
512px tile	7-16% of tile	87.9%
256px tile	15-31% of tile	91.4%

Tiling increases Waldo's relative size, providing more features for the detector to learn from.

Ablation Studies

Tile Size (with 25% overlap)

Tile Size	Detection Rate	Avg IoU	Time/Image
256px	91.4%	0.770	1.15s
512px	87.9%	0.742	0.67s
768px	79.3%	0.611	0.42s

Overlap (with 512px tiles)

Overlap	Detection Rate	Avg IoU	Time/Image
0%	93.1%	0.718	0.43s
12.5%	93.1%	0.755	0.43s
25%	93.1%	0.756	0.67s

Recommendation: Use 512px tiles with 12.5% overlap for the best speed-accuracy tradeoff.

Citation

@article{ramakrishnan2026finding,
  title={Finding Waldo in 0.43 Seconds: Tiled YOLOv8 for Small Object Detection in Dense Scenes},
  author={Ramakrishnan, Siddharth},
  journal={arXiv preprint},
  year={2026}
}

License

MIT License - see LICENSE file.

Acknowledgments

Dataset: Roboflow Where's Waldo
YOLOv8: Ultralytics

Downloads last month: 114