Estimating Individual Tree Height and Species from UAV Imagery
Paper β’ 2603.23669 β’ Published
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
A competitive solution for the CSIRO Image2Biomass Prediction Kaggle competition.
Predict pasture biomass from images to help farmers make smarter grazing decisions.
Targets (5 regression outputs):
| Target | Weight | Description |
|---|---|---|
| Dry_Total_g | 0.50 | Total dry biomass (grams) β most important |
| GDM_g | 0.20 | Green dry matter (grams) |
| Dry_Green_g | 0.10 | Dry green biomass |
| Dry_Dead_g | 0.10 | Dry dead biomass |
| Dry_Clover_g | 0.10 | Dry clover biomass |
Metric: Globally weighted RΒ² across all (image, target) pairs.
log1p(y) β train β expm1(pred) normalizes the distributionTotal β Green + Dead + Clover structural constraintInput Image (224Γ224)
β DINOv2-Base backbone (768-dim features)
β LayerNorm β Dropout(0.3)
β Linear(768, 512) β GELU β Dropout(0.15)
β Linear(512, 256) β GELU β Dropout(0.09)
β Linear(256, 5) β predictions
| Backbone | Params | Feature Dim | Input Size | Notes |
|---|---|---|---|---|
vit_base_patch14_dinov2.lvd142m |
86M | 768 | 224Γ224 | Best generalization |
vit_large_patch14_dinov2.lvd142m |
304M | 1024 | 224Γ224 | Higher quality, needs more VRAM |
convnext_large.fb_in22k_ft_in1k |
198M | 1536 | 224Γ224 | Strong CNN baseline |
efficientnet_b4.ra2_in1k |
19M | 1792 | 320Γ320 | Lightweight, fast |
swin_large_patch4_window7_224 |
197M | 1536 | 224Γ224 | Hierarchical ViT |
βββ train.py # Full training pipeline with CLI
βββ inference.py # Inference with ensemble + TTA
βββ train_ensemble.py # Multi-backbone ensemble training
βββ kaggle_train_notebook.py # Self-contained Kaggle training notebook
βββ kaggle_inference_notebook.py # Self-contained Kaggle inference notebook
βββ README.md # This file
pip install torch torchvision timm albumentations pandas numpy scikit-learn scipy pillow
python train.py \
--data_dir /path/to/competition/data \
--output_dir ./output \
--backbone dinov2_base \
--epochs 30 \
--batch_size 32 \
--backbone_lr 3e-5 \
--head_lr 1e-3 \
--n_folds 5 \
--aug_strength medium \
--use_lds \
--grad_checkpointing
python train_ensemble.py \
--data_dir /path/to/competition/data \
--output_dir ./ensemble_output \
--backbones dinov2_base convnext_large \
--epochs 30 \
--n_folds 5
python inference.py \
--data_dir /path/to/competition/data \
--model_dir ./output \
--output submission.csv \
--n_tta 4
kaggle_train_notebook.py as a Kaggle GPU notebookkaggle_inference_notebook.py with models as input dataset| Setting | Value | Rationale |
|---|---|---|
| Backbone LR | 3e-5 | Differential LR (0.5Γ of head) |
| Head LR | 1e-3 | Fast head convergence |
| Weight Decay | 1e-2 | Standard for AdamW |
| Warmup Ratio | 0.05 | 5% of training for LR warmup |
| Scheduler | Cosine | With warm restart |
| Batch Size | 32 | Effective 64 with grad_accum=2 |
| Augmentations | Medium | D4 + color jitter + CoarseDropout |
| Log Transform | Yes | Normalizes skewed targets |
| LDS | Yes | Handles imbalanced distributions |
| Consistency Weight | 0.1 | Total β Green + Dead + Clover |
| Early Stopping | 8 epochs | Based on validation RΒ² |
Key hyperparameters to tune:
backbone_lr: [1e-5, 3e-5, 5e-5]head_lr: [5e-4, 1e-3, 2e-3]dropout: [0.2, 0.3, 0.4]hidden_dim: [256, 512, 1024]consistency_weight: [0.0, 0.05, 0.1, 0.2]aug_strength: [light, medium, heavy]img_size: [224, 384, 448]Based on literature and OOF validation:
| Configuration | Expected CV RΒ² |
|---|---|
| DINOv2-Base (single) | 0.55β0.70 |
| ConvNeXt-Large (single) | 0.50β0.65 |
| DINOv2-Base + ConvNeXt-Large ensemble | 0.60β0.75 |
| DINOv2-Large + TTA | 0.60β0.75 |
| Full ensemble (3 backbones + TTA + LDS) | 0.65β0.80 |
Note: Actual scores depend on data quality, image resolution, and distribution shift between train/test.