Instructions to use aslakey/text_overlay_detection with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use aslakey/text_overlay_detection with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-classification", model="aslakey/text_overlay_detection") pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")# Load model directly from transformers import AutoTokenizer, AutoModelForImageClassification tokenizer = AutoTokenizer.from_pretrained("aslakey/text_overlay_detection") model = AutoModelForImageClassification.from_pretrained("aslakey/text_overlay_detection") - Notebooks
- Google Colab
- Kaggle
# Load model directly
from transformers import AutoTokenizer, AutoModelForImageClassification
tokenizer = AutoTokenizer.from_pretrained("aslakey/text_overlay_detection")
model = AutoModelForImageClassification.from_pretrained("aslakey/text_overlay_detection")Text Overlay Detection
Text overlays are widely used for subtitles, credits, watermarks, promotional messages, and explanatory labels. There are many use cases for which we may want to detect and/or remove text overlay – avoiding burn-in text when training image and video generation models, supplying clean content for ad creatives, removing burn-in text from diffing algorithms, and creating paired data for title treatment and other text generation tasks.
This model was trained on 2k pairs of data sampled using a VLM as a weakly supervised classifier. The 2k data was then manually annotated. The published model uses DinoV2 w/ Regsiters backbone and a modified preprocessor in order to remove center cropping (text overlays are often in the corners of images!).
How To Use
import torch
from PIL import Image
from transformers import AutoImageProcessor
from transformers import AutoModelForImageClassification
image_processor = AutoImageProcessor.from_pretrained("aslakey/text_overlay_detection")
model = AutoModelForImageClassification.from_pretrained('aslakey/text_overlay_detection')
model.eval()
# Model labels: [clean_single, double, group, over_the_shoulder, insert, establishing]
image = Image.open('overlay.png')
inputs = image_processor(image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
predicted_label = outputs.logits.argmax(-1).item()
print(model.config.id2label[predicted_label])
Model Performance
| Class | Precision | Recall | F1-score |
|---|---|---|---|
| no_text_overlay | 0.97 | 0.99 | 0.98 |
| text_overlay | 0.99 | 0.97 | 0.98 |
- Downloads last month
- 4
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-classification", model="aslakey/text_overlay_detection") pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")