Waldo Detection - Tiled YOLOv8 Models
Pre-trained YOLOv8s models for detecting Waldo in "Where's Waldo?" puzzle images using a tiled inference approach.
Model Description
These models achieve 93.1% detection rate on Where's Waldo puzzles by using a tiled training and inference approach. The key insight is that Waldo occupies only ~2% of typical puzzle images, which is too small for standard detectors. By decomposing images into overlapping tiles, we make Waldo 7-16% of each tile, enabling effective detection.
Key Results:
- 4.5x improvement over full-image fine-tuning (20.7% → 93.1%)
- 0.43 seconds per image on Apple M4 Max
- 0.756 average IoU localization accuracy
Available Models
| Model | Tile Size | Detection Rate | Avg IoU | Recommended Use |
|---|---|---|---|---|
waldo_yolov8s_tile256.pt |
256px | 91.4% | 0.770 | Maximum accuracy |
waldo_yolov8s_tile384.pt |
384px | 79.3% | 0.601 | - |
waldo_yolov8s_tile512.pt |
512px | 87.9% | 0.742 | Best balance (recommended) |
waldo_yolov8s_tile640.pt |
640px | 77.6% | 0.610 | - |
waldo_yolov8s_tile768.pt |
768px | 79.3% | 0.611 | Maximum speed |
Quick Start
Installation
pip install ultralytics huggingface_hub pillow
Download and Use
from huggingface_hub import hf_hub_download
from ultralytics import YOLO
from PIL import Image
# Download the recommended model
model_path = hf_hub_download(
repo_id="YOUR_USERNAME/waldo-yolov8-tiled",
filename="waldo_yolov8s_tile512.pt"
)
# Load model
model = YOLO(model_path)
Tiled Inference (Recommended)
For best results on full Where's Waldo images, use tiled inference:
from ultralytics import YOLO
from PIL import Image
from huggingface_hub import hf_hub_download
def find_waldo(image_path, model, tile_size=512, overlap=0.25, conf=0.1):
"""
Find Waldo using tiled inference.
Args:
image_path: Path to Where's Waldo image
model: Loaded YOLO model
tile_size: Size of tiles (must match model training size)
overlap: Overlap between tiles (0.0-0.5)
conf: Confidence threshold
Returns:
List of detections with coordinates and confidence
"""
img = Image.open(image_path).convert('RGB')
img_w, img_h = img.size
stride = int(tile_size * (1 - overlap))
all_detections = []
# Scan image with overlapping tiles
for y in range(0, img_h - tile_size + 1, stride):
for x in range(0, img_w - tile_size + 1, stride):
tile = img.crop((x, y, x + tile_size, y + tile_size))
results = model.predict(tile, conf=conf, verbose=False, imgsz=tile_size)
for result in results:
for box in result.boxes:
bx1, by1, bx2, by2 = box.xyxy[0].cpu().numpy()
all_detections.append({
'x1': float(x + bx1),
'y1': float(y + by1),
'x2': float(x + bx2),
'y2': float(y + by2),
'confidence': float(box.conf[0])
})
# Simple NMS to merge overlapping detections
if not all_detections:
return []
all_detections.sort(key=lambda d: d['confidence'], reverse=True)
keep = []
while all_detections:
best = all_detections.pop(0)
keep.append(best)
all_detections = [d for d in all_detections if iou(best, d) < 0.5]
return keep
def iou(box1, box2):
"""Compute IoU between two boxes."""
ix1 = max(box1['x1'], box2['x1'])
iy1 = max(box1['y1'], box2['y1'])
ix2 = min(box1['x2'], box2['x2'])
iy2 = min(box1['y2'], box2['y2'])
inter = max(0, ix2 - ix1) * max(0, iy2 - iy1)
area1 = (box1['x2'] - box1['x1']) * (box1['y2'] - box1['y1'])
area2 = (box2['x2'] - box2['x1']) * (box2['y2'] - box2['y1'])
return inter / (area1 + area2 - inter) if (area1 + area2 - inter) > 0 else 0
# Example usage
model_path = hf_hub_download(
repo_id="YOUR_USERNAME/waldo-yolov8-tiled",
filename="waldo_yolov8s_tile512.pt"
)
model = YOLO(model_path)
detections = find_waldo("waldo_puzzle.jpg", model)
if detections:
best = detections[0]
print(f"Found Waldo at ({best['x1']:.0f}, {best['y1']:.0f}) with {best['confidence']:.1%} confidence")
Full Code Repository
For complete training, evaluation, and ablation study code, see our GitHub repository:
GitHub: https://github.com/YOUR_USERNAME/waldo-detection
Training Details
- Architecture: YOLOv8s (11M parameters)
- Training Data: Where's Waldo dataset from Roboflow (58 images → ~2000 tiles)
- Tile Size: Varies by model (256-768px)
- Overlap: 25% during training
- Epochs: 100 with early stopping (patience=20)
- Augmentation: Reduced mosaic (0.5), scale (±30%), no mixup (optimized for small objects)
- Hardware: Apple M4 Max with MPS acceleration
Why Tiling Works
| Setting | Waldo Size | Detection Rate |
|---|---|---|
| Full image (1800px) | ~2% of image | 20.7% |
| 512px tile | 7-16% of tile | 87.9% |
| 256px tile | 15-31% of tile | 91.4% |
Tiling increases Waldo's relative size, providing more features for the detector to learn from.
Ablation Studies
Tile Size (with 25% overlap)
| Tile Size | Detection Rate | Avg IoU | Time/Image |
|---|---|---|---|
| 256px | 91.4% | 0.770 | 1.15s |
| 512px | 87.9% | 0.742 | 0.67s |
| 768px | 79.3% | 0.611 | 0.42s |
Overlap (with 512px tiles)
| Overlap | Detection Rate | Avg IoU | Time/Image |
|---|---|---|---|
| 0% | 93.1% | 0.718 | 0.43s |
| 12.5% | 93.1% | 0.755 | 0.43s |
| 25% | 93.1% | 0.756 | 0.67s |
Recommendation: Use 512px tiles with 12.5% overlap for the best speed-accuracy tradeoff.
Citation
@article{ramakrishnan2026finding,
title={Finding Waldo in 0.43 Seconds: Tiled YOLOv8 for Small Object Detection in Dense Scenes},
author={Ramakrishnan, Siddharth},
journal={arXiv preprint},
year={2026}
}
License
MIT License - see LICENSE file.
Acknowledgments
- Dataset: Roboflow Where's Waldo
- YOLOv8: Ultralytics
- Downloads last month
- 114