| | --- |
| | library_name: transformers |
| | tags: [] |
| | --- |
| | |
| | # Text Overlay Detection |
| |
|
| | Text overlays are widely used for subtitles, credits, watermarks, promotional messages, and explanatory labels. |
| | There are many use cases for which we may want to detect and/or remove text overlay – avoiding burn-in text when training image and video generation models, |
| | supplying clean content for ad creatives, removing burn-in text from diffing algorithms, and |
| | creating paired data for title treatment and other text generation tasks. |
| |
|
| | This model was trained on 2k pairs of data sampled using a VLM as a weakly supervised classifier. The 2k data was then manually annotated. The published model uses |
| | DinoV2 w/ Regsiters backbone and a modified preprocessor in order to remove center cropping (text overlays are often in the corners of images!). |
| |
|
| |
|
| | ## How To Use |
| | ``` |
| | import torch |
| | from PIL import Image |
| | from transformers import AutoImageProcessor |
| | from transformers import AutoModelForImageClassification |
| | |
| | image_processor = AutoImageProcessor.from_pretrained("aslakey/text_overlay_detection") |
| | model = AutoModelForImageClassification.from_pretrained('aslakey/text_overlay_detection') |
| | model.eval() |
| | |
| | # Model labels: [clean_single, double, group, over_the_shoulder, insert, establishing] |
| | image = Image.open('overlay.png') |
| | inputs = image_processor(image, return_tensors="pt") |
| | with torch.no_grad(): |
| | outputs = model(**inputs) |
| | |
| | predicted_label = outputs.logits.argmax(-1).item() |
| | print(model.config.id2label[predicted_label]) |
| | ``` |
| |
|
| | ## Model Performance |
| | | Class | Precision | Recall | F1-score | |
| | |-----------------|-----------|--------|----------| |
| | | no_text_overlay | 0.97 | 0.99 | 0.98 | |
| | | text_overlay | 0.99 | 0.97 | 0.98 | |