--- license: apache-2.0 --- # Shot Type This model predicts an image's cinematic shot type [clean single, double, group, over the shoulder, insert, establishing]. The model is a DinoV2 with registers backbone (initiated with `facebook/dinov2-with-registers-large` weights) and trained on a diverse set of five thousand human-annotated images. ## How to use: ```python import torch from PIL import Image from transformers import AutoImageProcessor from transformers import AutoModelForImageClassification image_processor = AutoImageProcessor.from_pretrained("facebook/dinov2-with-registers-large") model = AutoModelForImageClassification.from_pretrained('aslakey/depth_of_field') model.eval() # Model labels: [clean_single, double, group, over_the_shoulder, insert, establishing] image = Image.open('cinematic_shot.jpg') inputs = image_processor(image, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) predicted_label = outputs.logits.argmax(-1).item() print(model.config.id2label[predicted_label]) ``` ## Performance: | Category | Precision | Recall | |----------|-----------|--------| | clean_single | 81% | 89% | | double | 80% | 72% | | group | 91% | 74% | | over the shoulder | 60% | 67% | | establishing | 91% | 77% |