---
library_name: transformers
tags: []
---

# Text Overlay Detection

Text overlays are widely used for subtitles, credits, watermarks, promotional messages, and explanatory labels.
There are many use cases for which we may want to detect and/or remove text overlay – avoiding burn-in text when training image and video generation models,
supplying clean content for ad creatives, removing burn-in text from diffing algorithms, and
creating paired data for title treatment and other text generation tasks.

This model was trained on 2k pairs of data sampled using a VLM as a weakly supervised classifier.  The 2k data was then manually annotated.  The published model uses
DinoV2 w/ Regsiters backbone and a modified preprocessor in order to remove center cropping (text overlays are often in the corners of images!).


## How To Use
```
import torch
from PIL import Image
from transformers import AutoImageProcessor
from transformers import AutoModelForImageClassification

image_processor = AutoImageProcessor.from_pretrained("aslakey/text_overlay_detection")
model = AutoModelForImageClassification.from_pretrained('aslakey/text_overlay_detection')
model.eval()

# Model labels: [clean_single, double, group, over_the_shoulder, insert, establishing]
image = Image.open('overlay.png')
inputs = image_processor(image, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

predicted_label = outputs.logits.argmax(-1).item()
print(model.config.id2label[predicted_label])
```

## Model Performance
| Class           | Precision | Recall | F1-score |
|-----------------|-----------|--------|----------|
| no_text_overlay | 0.97      | 0.99   | 0.98     |
| text_overlay    | 0.99      | 0.97   | 0.98     |