| # Pose Classifier Guide |
|
|
| ## Overview |
|
|
| The pose classifier predicts the orientation of animals (zebras, giraffes, etc.) relative to the camera position from aerial drone footage. This is critical for navigation and behavior analysis. |
|
|
| ## 8-Class Pose Classification System |
|
|
| ### Pose Classes |
|
|
| The classifier identifies **8 discrete pose orientations** arranged in a circle around the animal: |
|
|
| 1. **front** - Animal facing directly toward camera |
| 2. **front-left** - Animal facing camera, angled to the left (~45°) |
| 3. **left** - Animal's left side visible, perpendicular to camera |
| 4. **back-left** - Animal facing away, angled to the left (~45°) |
| 5. **back** - Animal facing directly away from camera |
| 6. **back-right** - Animal facing away, angled to the right (~45°) |
| 7. **right** - Animal's right side visible, perpendicular to camera |
| 8. **front-right** - Animal facing camera, angled to the right (~45°) |
|
|
| ### Visual Reference |
|
|
|  |
|
|
| The diagram shows the 8 pose classes arranged in a circle. The camera is positioned at the bottom, and the animal (zebra) is in the center. Each orange dot represents one of the 8 possible pose classifications. |
|
|
| ## Example Poses |
|
|
| ### Front Pose |
| **Label:** `front` |
|
|
| The animal is facing directly toward the camera, with the head and front body visible. |
|
|
|  |
|
|
| --- |
|
|
| ### Front-Left Pose |
| **Label:** `front-left` |
|
|
| The animal is facing toward the camera but angled to its left (camera's right), showing both the front and left side. |
|
|
|  |
|
|
| --- |
|
|
| ### Front-Right Pose |
| **Label:** `front-right` |
|
|
| The animal is facing toward the camera but angled to its right (camera's left), showing both the front and right side. |
|
|
|  |
|
|
| --- |
|
|
| ### Left Pose |
| **Label:** `left` |
|
|
| The animal's left side is visible, perpendicular to the camera. This is a pure profile view. |
|
|
|  |
|
|
| --- |
|
|
| ### Right Pose |
| **Label:** `right` |
|
|
| The animal's right side is visible, perpendicular to the camera. This is a pure profile view from the opposite side. |
|
|
|  |
|
|
| --- |
|
|
| ### Back-Left Pose |
| **Label:** `back-left` |
|
|
| The animal is facing away from the camera but angled to its left, showing the rear-left quarter. |
|
|
|  |
|
|
| --- |
|
|
| ### Back-Right Pose |
| **Label:** `back-right` |
|
|
| The animal is facing away from the camera but angled to its right, showing the rear-right quarter. |
|
|
|  |
|
|
| --- |
|
|
| ### Back Pose |
| **Label:** `back` |
|
|
| The animal is facing directly away from the camera, with the rear and back visible. |
|
|
|  |
|
|
| --- |
|
|
| ## Model Architecture |
|
|
| ### DINOv2 + MLP Head |
|
|
| The pose classifier uses a **frozen DINOv2 backbone** with a **trainable MLP classification head**: |
|
|
| ``` |
| Input Image (224×224) |
| ↓ |
| DINOv2 Vision Transformer (frozen) |
| - Small: 384-dim features |
| - Base: 768-dim features |
| - Large: 1024-dim features |
| ↓ |
| MLP Head (trainable) |
| - LayerNorm |
| - Linear(feat_dim -> 256) + GELU + Dropout(0.3) |
| - Linear(256 -> 128) + GELU + Dropout(0.3) |
| - Linear(128 -> 8) |
| ↓ |
| Output Logits (8 classes) |
| ``` |
|
|
| ### Why DINOv2? |
|
|
| - **Self-supervised learning** on diverse images provides strong visual features |
| - **Frozen backbone** reduces training time and prevents overfitting |
| - **Small memory footprint** suitable for deployment |
| - **Robust to varying image quality** from aerial footage |
|
|
| ## Training Pipeline |
|
|
| ### Data Organization |
|
|
| Training data is organized in folder structure: |
| ``` |
| pose_labels/ |
| _reference.png # Visual guide |
| front/ # Front-facing animals |
| front-left/ # Front-left quarter |
| left/ # Left profile |
| back-left/ # Back-left quarter |
| back/ # Back-facing animals |
| back-right/ # Back-right quarter |
| right/ # Right profile |
| front-right/ # Front-right quarter |
| ``` |
|
|
| Or via CSV files with columns: `image_path, pose` |
|
|
| ### Data Augmentation |
|
|
| **Geometric Augmentation with Label Swapping:** |
| - Horizontal flip applied with 50% probability |
| - When flipped, pose labels are swapped according to symmetry: |
| - `left` <-> `right` |
| - `front-left` <-> `front-right` |
| - `back-left` <-> `back-right` |
| - `front` and `back` remain unchanged |
|
|
| **Color/Transform Augmentation:** |
| - Random crop (256px -> 224px) |
| - Color jitter: brightness (±30%), contrast (±30%), saturation (±20%) |
| - Random rotation (±15°) |
|
|
| **Class Balancing:** |
| - Weighted random sampler ensures equal representation of all 8 classes during training |
|
|
| ### Training Configuration |
|
|
| ```bash |
| python train_pose_classifier.py \ |
| --data_dir ./pose_labels \ |
| --model_size small \ |
| --epochs 30 \ |
| --batch_size 32 \ |
| --lr 1e-3 |
| ``` |
|
|
| **Key Parameters:** |
| - **Model size**: `small`, `base`, or `large` (DINOv2 variant) |
| - **Optimizer**: AdamW with weight decay 0.01 |
| - **Loss**: CrossEntropyLoss with label smoothing (0.1) |
| - **Scheduler**: CosineAnnealingLR |
| - **Mixed precision**: Automatic on GPU |
|
|
| **Training Output:** |
| - Best model saved to `checkpoints/best_pose_model.pth` |
| - Includes confusion matrix and per-class accuracy |
| - Optional ONNX export for deployment |
|
|
| ## Usage in Navigation |
|
|
| ### Integration with Detection Pipeline |
|
|
| The pose classifier is used in the navigation system after animal detection: |
|
|
| ```python |
| from navigation.policy.pose_classifier import ViewPointClassifier |
| from PIL import Image |
| |
| # Initialize classifier |
| classifier = ViewPointClassifier( |
| weight_path="model_weights/best_june_24_2025_IA_classifier_016.pth", |
| device="cpu", |
| threshold=0.5 |
| ) |
| |
| # Process detected animal crops |
| crops = [Image.open(path) for path in detection_crops] |
| poses = classifier(crops) # Returns list of pose strings |
| |
| # Use poses for navigation decisions |
| for pose in poses: |
| if "front" in pose: |
| print("Animal is facing camera - approach with caution") |
| elif "back" in pose: |
| print("Animal is facing away - good for following") |
| ``` |
|
|
| ### Multi-Label Pose System (Alternative) |
|
|
| The `ViewPointClassifier` in `pose_classifier.py` uses a different approach: |
|
|
| - **5 multi-label classes**: `up, front, back, right, left` |
| - **EfficientNet-B4** backbone trained on zebra crops |
| - **Input size**: 512×512 pixels |
| - **Output**: Concatenated string (e.g., `"upfrontright"`) |
| - **Threshold**: 0.5 (configurable) |
|
|
| This allows detecting compound poses like "animal is facing front-right while looking up." |
|
|
| ## Performance Considerations |
|
|
| ### Inference Speed |
| - **DINOv2-small**: ~15-20ms per image (CPU) |
| - **DINOv2-base**: ~30-40ms per image (CPU) |
| - **GPU acceleration**: 5-10x faster |
|
|
| ### Accuracy Targets |
| - **Overall accuracy**: >85% on validation set |
| - **Critical classes** (front/back): >90% accuracy |
| - **Confusion**: Most errors occur between adjacent classes (e.g., front vs. front-left) |
|
|
| ### Deployment Notes |
| - Model checkpoint: ~150MB (small), ~350MB (base) |
| - ONNX export available for optimized inference |
| - Batch processing recommended for multiple detections |
|
|
| ## Common Issues & Tips |
|
|
| ### Issue: Poor performance on occluded animals |
| **Solution**: Train with more occluded examples or use confidence thresholding |
|
|
| ### Issue: Confusion between adjacent poses |
| **Solution**: This is expected due to continuous nature of orientations; consider using pose groups (front-facing vs. side-facing vs. back-facing) |
|
|
| ### Issue: Inconsistent predictions across frames |
| **Solution**: Apply temporal smoothing or majority voting across consecutive frames |
|
|
| ### Issue: Different performance on zebras vs. other species |
| **Solution**: Retrain with balanced dataset across species, or train species-specific models |
|
|
| ## Dataset Statistics |
|
|
| Current training data distribution (from folder structure): |
| - Folders: `front`, `front-left`, `front-right`, `left`, `right`, `back-left`, `back-right`, `back` |
| - Images per class: Variable (check with `train_pose_classifier.py --data_dir pose_labels`) |
| - Species: Primarily zebras and giraffes |
| - Source: Aerial drone footage from Mpala and OPC sessions |
|
|
| ## References |
|
|
| - DINOv2 Paper: [https://arxiv.org/abs/2304.07193](https://arxiv.org/abs/2304.07193) |
| - VARe-ID (ViewPoint Classifier): [https://github.com/ziesski/VARe-ID](https://github.com/ziesski/VARe-ID) |
| - Individual identification of wildlife: [https://doi.org/10.1007/s10344-021-01549-4](Review on methods used for wildlife species and individual identification) |
| - Training script: [train_pose_classifier.py](train_pose_classifier.py) |
| - Navigation integration: [navigation/policy/pose_classifier.py](../navigation/policy/pose_classifier.py) |
|
|
| ## Quick Start |
|
|
| 1. **Prepare data**: Organize images in `pose_labels/` folders by class |
| 2. **Train model**: `python train_pose_classifier.py --data_dir ./pose_labels --epochs 30` |
| 3. **Evaluate**: Check confusion matrix and per-class accuracy in output |
| 4. **Export**: Use `--export_onnx` flag for optimized deployment |
| 5. **Integrate**: Load checkpoint and use for inference on detection crops |
|
|