# Pose Classifier Guide

## Overview

The pose classifier predicts the orientation of animals (zebras, giraffes, etc.) relative to the camera position from aerial drone footage. This is critical for navigation and behavior analysis.

## 8-Class Pose Classification System

### Pose Classes

The classifier identifies **8 discrete pose orientations** arranged in a circle around the animal:

1. **front** - Animal facing directly toward camera
2. **front-left** - Animal facing camera, angled to the left (~45°)
3. **left** - Animal's left side visible, perpendicular to camera
4. **back-left** - Animal facing away, angled to the left (~45°)
5. **back** - Animal facing directly away from camera
6. **back-right** - Animal facing away, angled to the right (~45°)
7. **right** - Animal's right side visible, perpendicular to camera
8. **front-right** - Animal facing camera, angled to the right (~45°)

### Visual Reference

![Pose Reference Diagram](https://huggingface.co/imageomics/mmla-dino-pose/resolve/main/_reference.png)

The diagram shows the 8 pose classes arranged in a circle. The camera is positioned at the bottom, and the animal (zebra) is in the center. Each orange dot represents one of the 8 possible pose classifications.

## Example Poses

### Front Pose
**Label:** `front`

The animal is facing directly toward the camera, with the head and front body visible.

![Front Pose Example](https://huggingface.co/imageomics/mmla-dino-pose/resolve/main/front/mpala_session_1_DJI_0002_partition_1_DJI_0002_000171_c0_004.jpg)

---

### Front-Left Pose
**Label:** `front-left`

The animal is facing toward the camera but angled to its left (camera's right), showing both the front and left side.

![Front-Left Pose Example](https://huggingface.co/imageomics/mmla-dino-pose/resolve/main/front-left/mpala_session_1_DJI_0002_partition_1_DJI_0002_000471_c0_005.jpg)

---

### Front-Right Pose
**Label:** `front-right`

The animal is facing toward the camera but angled to its right (camera's left), showing both the front and right side.

![Front-Right Pose Example](https://huggingface.co/imageomics/mmla-dino-pose/resolve/main/front-right/mpala_session_2_DJI_0006_partition_2_DJI_0006_006552_c0_004.jpg)

---

### Left Pose
**Label:** `left`

The animal's left side is visible, perpendicular to the camera. This is a pure profile view.

![Left Pose Example](https://huggingface.co/imageomics/mmla-dino-pose/resolve/main/left/mpala_session_1_DJI_0002_partition_1_DJI_0002_000321_c1_001.jpg)

---

### Right Pose
**Label:** `right`

The animal's right side is visible, perpendicular to the camera. This is a pure profile view from the opposite side.

![Right Pose Example](https://huggingface.co/imageomics/mmla-dino-pose/resolve/main/right/mpala_session_1_DJI_0002_partition_1_DJI_0002_000171_c0_002.jpg)

---

### Back-Left Pose
**Label:** `back-left`

The animal is facing away from the camera but angled to its left, showing the rear-left quarter.

![Back-Left Pose Example](https://huggingface.co/imageomics/mmla-dino-pose/resolve/main/back-left/mpala_session_1_DJI_0002_partition_1_DJI_0002_000171_c1_001.jpg)

---

### Back-Right Pose
**Label:** `back-right`

The animal is facing away from the camera but angled to its right, showing the rear-right quarter.

![Back-Right Pose Example](https://huggingface.co/imageomics/mmla-dino-pose/resolve/main/back-right/mpala_session_1_DJI_0002_partition_1_DJI_0002_000171_c1_000.jpg)

---

### Back Pose
**Label:** `back`

The animal is facing directly away from the camera, with the rear and back visible.

![Back Pose Example](https://huggingface.co/imageomics/mmla-dino-pose/resolve/main/back/mpala_session_1_DJI_0002_partition_1_DJI_0002_000321_c1_000.jpg)

---

## Model Architecture

### DINOv2 + MLP Head

The pose classifier uses a **frozen DINOv2 backbone** with a **trainable MLP classification head**:

```
Input Image (224×224)
    ↓
DINOv2 Vision Transformer (frozen)
    - Small: 384-dim features
    - Base: 768-dim features
    - Large: 1024-dim features
    ↓
MLP Head (trainable)
    - LayerNorm
    - Linear(feat_dim -> 256) + GELU + Dropout(0.3)
    - Linear(256 -> 128) + GELU + Dropout(0.3)
    - Linear(128 -> 8)
    ↓
Output Logits (8 classes)
```

### Why DINOv2?

- **Self-supervised learning** on diverse images provides strong visual features
- **Frozen backbone** reduces training time and prevents overfitting
- **Small memory footprint** suitable for deployment
- **Robust to varying image quality** from aerial footage

## Training Pipeline

### Data Organization

Training data is organized in folder structure:
```
pose_labels/
  _reference.png          # Visual guide
  front/                  # Front-facing animals
  front-left/             # Front-left quarter
  left/                   # Left profile
  back-left/              # Back-left quarter
  back/                   # Back-facing animals
  back-right/             # Back-right quarter
  right/                  # Right profile
  front-right/            # Front-right quarter
```

Or via CSV files with columns: `image_path, pose`

### Data Augmentation

**Geometric Augmentation with Label Swapping:**
- Horizontal flip applied with 50% probability
- When flipped, pose labels are swapped according to symmetry:
  - `left` <-> `right`
  - `front-left` <-> `front-right`
  - `back-left` <-> `back-right`
  - `front` and `back` remain unchanged

**Color/Transform Augmentation:**
- Random crop (256px -> 224px)
- Color jitter: brightness (±30%), contrast (±30%), saturation (±20%)
- Random rotation (±15°)

**Class Balancing:**
- Weighted random sampler ensures equal representation of all 8 classes during training

### Training Configuration

```bash
python train_pose_classifier.py \
    --data_dir ./pose_labels \
    --model_size small \
    --epochs 30 \
    --batch_size 32 \
    --lr 1e-3
```

**Key Parameters:**
- **Model size**: `small`, `base`, or `large` (DINOv2 variant)
- **Optimizer**: AdamW with weight decay 0.01
- **Loss**: CrossEntropyLoss with label smoothing (0.1)
- **Scheduler**: CosineAnnealingLR
- **Mixed precision**: Automatic on GPU

**Training Output:**
- Best model saved to `checkpoints/best_pose_model.pth`
- Includes confusion matrix and per-class accuracy
- Optional ONNX export for deployment

## Usage in Navigation

### Integration with Detection Pipeline

The pose classifier is used in the navigation system after animal detection:

```python
from navigation.policy.pose_classifier import ViewPointClassifier
from PIL import Image

# Initialize classifier
classifier = ViewPointClassifier(
    weight_path="model_weights/best_june_24_2025_IA_classifier_016.pth",
    device="cpu",
    threshold=0.5
)

# Process detected animal crops
crops = [Image.open(path) for path in detection_crops]
poses = classifier(crops)  # Returns list of pose strings

# Use poses for navigation decisions
for pose in poses:
    if "front" in pose:
        print("Animal is facing camera - approach with caution")
    elif "back" in pose:
        print("Animal is facing away - good for following")
```

### Multi-Label Pose System (Alternative)

The `ViewPointClassifier` in `pose_classifier.py` uses a different approach:

- **5 multi-label classes**: `up, front, back, right, left`
- **EfficientNet-B4** backbone trained on zebra crops
- **Input size**: 512×512 pixels
- **Output**: Concatenated string (e.g., `"upfrontright"`)
- **Threshold**: 0.5 (configurable)

This allows detecting compound poses like "animal is facing front-right while looking up."

## Performance Considerations

### Inference Speed
- **DINOv2-small**: ~15-20ms per image (CPU)
- **DINOv2-base**: ~30-40ms per image (CPU)
- **GPU acceleration**: 5-10x faster

### Accuracy Targets
- **Overall accuracy**: >85% on validation set
- **Critical classes** (front/back): >90% accuracy
- **Confusion**: Most errors occur between adjacent classes (e.g., front vs. front-left)

### Deployment Notes
- Model checkpoint: ~150MB (small), ~350MB (base)
- ONNX export available for optimized inference
- Batch processing recommended for multiple detections

## Common Issues & Tips

### Issue: Poor performance on occluded animals
**Solution**: Train with more occluded examples or use confidence thresholding

### Issue: Confusion between adjacent poses
**Solution**: This is expected due to continuous nature of orientations; consider using pose groups (front-facing vs. side-facing vs. back-facing)

### Issue: Inconsistent predictions across frames
**Solution**: Apply temporal smoothing or majority voting across consecutive frames

### Issue: Different performance on zebras vs. other species
**Solution**: Retrain with balanced dataset across species, or train species-specific models

## Dataset Statistics

Current training data distribution (from folder structure):
- Folders: `front`, `front-left`, `front-right`, `left`, `right`, `back-left`, `back-right`, `back`
- Images per class: Variable (check with `train_pose_classifier.py --data_dir pose_labels`)
- Species: Primarily zebras and giraffes
- Source: Aerial drone footage from Mpala and OPC sessions

## References

- DINOv2 Paper: [https://arxiv.org/abs/2304.07193](https://arxiv.org/abs/2304.07193)
- VARe-ID (ViewPoint Classifier): [https://github.com/ziesski/VARe-ID](https://github.com/ziesski/VARe-ID)
- Individual identification of wildlife: [https://doi.org/10.1007/s10344-021-01549-4](Review on methods used for wildlife species and individual identification)
- Training script: [train_pose_classifier.py](train_pose_classifier.py)
- Navigation integration: [navigation/policy/pose_classifier.py](../navigation/policy/pose_classifier.py)

## Quick Start

1. **Prepare data**: Organize images in `pose_labels/` folders by class
2. **Train model**: `python train_pose_classifier.py --data_dir ./pose_labels --epochs 30`
3. **Evaluate**: Check confusion matrix and per-class accuracy in output
4. **Export**: Use `--export_onnx` flag for optimized deployment
5. **Integrate**: Load checkpoint and use for inference on detection crops