aslakey's picture
model README
f380866 verified
---
library_name: transformers
tags: []
---
# Text Overlay Detection
Text overlays are widely used for subtitles, credits, watermarks, promotional messages, and explanatory labels.
There are many use cases for which we may want to detect and/or remove text overlay – avoiding burn-in text when training image and video generation models,
supplying clean content for ad creatives, removing burn-in text from diffing algorithms, and
creating paired data for title treatment and other text generation tasks.
This model was trained on 2k pairs of data sampled using a VLM as a weakly supervised classifier. The 2k data was then manually annotated. The published model uses
DinoV2 w/ Regsiters backbone and a modified preprocessor in order to remove center cropping (text overlays are often in the corners of images!).
## How To Use
```
import torch
from PIL import Image
from transformers import AutoImageProcessor
from transformers import AutoModelForImageClassification
image_processor = AutoImageProcessor.from_pretrained("aslakey/text_overlay_detection")
model = AutoModelForImageClassification.from_pretrained('aslakey/text_overlay_detection')
model.eval()
# Model labels: [clean_single, double, group, over_the_shoulder, insert, establishing]
image = Image.open('overlay.png')
inputs = image_processor(image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
predicted_label = outputs.logits.argmax(-1).item()
print(model.config.id2label[predicted_label])
```
## Model Performance
| Class | Precision | Recall | F1-score |
|-----------------|-----------|--------|----------|
| no_text_overlay | 0.97 | 0.99 | 0.98 |
| text_overlay | 0.99 | 0.97 | 0.98 |