aslakey
/

text_overlay_detection

Image Classification

dinov2_with_registers

Model card Files Files and versions

text_overlay_detection / README.md

aslakey's picture

model README

f380866 verified 4 months ago

|

history blame contribute delete

1.73 kB

	---
	library_name: transformers
	tags: []
	---

	# Text Overlay Detection

	Text overlays are widely used for subtitles, credits, watermarks, promotional messages, and explanatory labels.
	There are many use cases for which we may want to detect and/or remove text overlay – avoiding burn-in text when training image and video generation models,
	supplying clean content for ad creatives, removing burn-in text from diffing algorithms, and
	creating paired data for title treatment and other text generation tasks.

	This model was trained on 2k pairs of data sampled using a VLM as a weakly supervised classifier. The 2k data was then manually annotated. The published model uses
	DinoV2 w/ Regsiters backbone and a modified preprocessor in order to remove center cropping (text overlays are often in the corners of images!).


	## How To Use
	```
	import torch
	from PIL import Image
	from transformers import AutoImageProcessor
	from transformers import AutoModelForImageClassification

	image_processor = AutoImageProcessor.from_pretrained("aslakey/text_overlay_detection")
	model = AutoModelForImageClassification.from_pretrained('aslakey/text_overlay_detection')
	model.eval()

	# Model labels: [clean_single, double, group, over_the_shoulder, insert, establishing]
	image = Image.open('overlay.png')
	inputs = image_processor(image, return_tensors="pt")
	with torch.no_grad():
	outputs = model(**inputs)

	predicted_label = outputs.logits.argmax(-1).item()
	print(model.config.id2label[predicted_label])
	```

	## Model Performance
	\| Class \| Precision \| Recall \| F1-score \|
	\|-----------------\|-----------\|--------\|----------\|
	\| no_text_overlay \| 0.97 \| 0.99 \| 0.98 \|
	\| text_overlay \| 0.99 \| 0.97 \| 0.98 \|