---
license: apache-2.0
---

# Camera Level

This model predicts an image's cinematic camera level [ground, hip, shoulder, eye, aerial].  The model is a DinoV2 with registers backbone (initiated with `facebook/dinov2-with-registers-large` weights) and trained on a diverse set of five thousand human-annotated images.

## How to use:
```python

import torch
from PIL import Image
from transformers import AutoImageProcessor
from transformers import AutoModelForImageClassification

image_processor = AutoImageProcessor.from_pretrained("facebook/dinov2-with-registers-large")
model = AutoModelForImageClassification.from_pretrained('aslakey/camera_level')
model.eval()

# Model labels: [ground, hip, shoulder, eye, aerial]
image = Image.open('cinematic_shot.jpg')
inputs = image_processor(image, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

# technically multi-label training, but argmax works too!
predicted_label = outputs.logits.argmax(-1).item()
print(model.config.id2label[predicted_label])
```

## Performance:


| Category | Precision | Recall |
|----------|-----------|--------|
| ground      | 65%        | 51%     |
| hip         | 69%       | 62%    |
| shoulder        | 68%       | 74%    |
| eye        | 51%       | 39%    |
| aerial        | 89%       | 76%    |