aslakey
/

camera_level

dinov2_with_registers

Model card Files Files and versions

camera_level / README.md

aslakey's picture

Update README.md

90b43b3 verified 7 months ago

|

history blame contribute delete

1.31 kB

	---
	license: apache-2.0
	---

	# Camera Level

	This model predicts an image's cinematic camera level [ground, hip, shoulder, eye, aerial]. The model is a DinoV2 with registers backbone (initiated with `facebook/dinov2-with-registers-large` weights) and trained on a diverse set of five thousand human-annotated images.

	## How to use:
	```python

	import torch
	from PIL import Image
	from transformers import AutoImageProcessor
	from transformers import AutoModelForImageClassification

	image_processor = AutoImageProcessor.from_pretrained("facebook/dinov2-with-registers-large")
	model = AutoModelForImageClassification.from_pretrained('aslakey/camera_level')
	model.eval()

	# Model labels: [ground, hip, shoulder, eye, aerial]
	image = Image.open('cinematic_shot.jpg')
	inputs = image_processor(image, return_tensors="pt")
	with torch.no_grad():
	outputs = model(**inputs)

	# technically multi-label training, but argmax works too!
	predicted_label = outputs.logits.argmax(-1).item()
	print(model.config.id2label[predicted_label])
	```

	## Performance:


	\| Category \| Precision \| Recall \|
	\|----------\|-----------\|--------\|
	\| ground \| 65% \| 51% \|
	\| hip \| 69% \| 62% \|
	\| shoulder \| 68% \| 74% \|
	\| eye \| 51% \| 39% \|
	\| aerial \| 89% \| 76% \|