--- library_name: transformers tags: [] --- # Text Overlay Detection Text overlays are widely used for subtitles, credits, watermarks, promotional messages, and explanatory labels. There are many use cases for which we may want to detect and/or remove text overlay – avoiding burn-in text when training image and video generation models, supplying clean content for ad creatives, removing burn-in text from diffing algorithms, and creating paired data for title treatment and other text generation tasks. This model was trained on 2k pairs of data sampled using a VLM as a weakly supervised classifier. The 2k data was then manually annotated. The published model uses DinoV2 w/ Regsiters backbone and a modified preprocessor in order to remove center cropping (text overlays are often in the corners of images!). ## How To Use ``` import torch from PIL import Image from transformers import AutoImageProcessor from transformers import AutoModelForImageClassification image_processor = AutoImageProcessor.from_pretrained("aslakey/text_overlay_detection") model = AutoModelForImageClassification.from_pretrained('aslakey/text_overlay_detection') model.eval() # Model labels: [clean_single, double, group, over_the_shoulder, insert, establishing] image = Image.open('overlay.png') inputs = image_processor(image, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) predicted_label = outputs.logits.argmax(-1).item() print(model.config.id2label[predicted_label]) ``` ## Model Performance | Class | Precision | Recall | F1-score | |-----------------|-----------|--------|----------| | no_text_overlay | 0.97 | 0.99 | 0.98 | | text_overlay | 0.99 | 0.97 | 0.98 |