CSIRO Image2Biomass Prediction 🌿

A competitive solution for the CSIRO Image2Biomass Prediction Kaggle competition.

🏆 Competition Overview

Predict pasture biomass from images to help farmers make smarter grazing decisions.

Targets (5 regression outputs):

Target	Weight	Description
Dry_Total_g	0.50	Total dry biomass (grams) — most important
GDM_g	0.20	Green dry matter (grams)
Dry_Green_g	0.10	Dry green biomass
Dry_Dead_g	0.10	Dry dead biomass
Dry_Clover_g	0.10	Dry clover biomass

Metric: Globally weighted R² across all (image, target) pairs.

🚀 Solution Architecture

Key Design Decisions

DINOv2 backbone — Self-supervised ViT pretrained on 142M images provides superior feature generalization for out-of-distribution agricultural imagery
Log-transform targets — Biomass values are right-skewed; log1p(y) → train → expm1(pred) normalizes the distribution
Weighted SmoothL1 loss — Robust to field measurement noise (recommended by DINOvTree paper), weighted by competition target importance
Consistency regularization — Enforces Total ≈ Green + Dead + Clover structural constraint
Label Distribution Smoothing (LDS) — Addresses imbalanced target distributions (from "Delving into Deep Imbalanced Regression", ICML 2021)
D4 augmentations — Full dihedral group (flips + rotations) for top-down pasture images
Multi-backbone ensemble — DINOv2 + ConvNeXt for diverse predictions
Test-Time Augmentation (TTA) — 4× TTA with geometric transforms

Model Architecture

Input Image (224×224) 
    → DINOv2-Base backbone (768-dim features)
    → LayerNorm → Dropout(0.3) 
    → Linear(768, 512) → GELU → Dropout(0.15)
    → Linear(512, 256) → GELU → Dropout(0.09)
    → Linear(256, 5) → predictions

Backbones (ranked by expected performance)

Backbone	Params	Feature Dim	Input Size	Notes
`vit_base_patch14_dinov2.lvd142m`	86M	768	224×224	Best generalization
`vit_large_patch14_dinov2.lvd142m`	304M	1024	224×224	Higher quality, needs more VRAM
`convnext_large.fb_in22k_ft_in1k`	198M	1536	224×224	Strong CNN baseline
`efficientnet_b4.ra2_in1k`	19M	1792	320×320	Lightweight, fast
`swin_large_patch4_window7_224`	197M	1536	224×224	Hierarchical ViT

📁 Project Structure

├── train.py                     # Full training pipeline with CLI
├── inference.py                 # Inference with ensemble + TTA
├── train_ensemble.py            # Multi-backbone ensemble training
├── kaggle_train_notebook.py     # Self-contained Kaggle training notebook
├── kaggle_inference_notebook.py # Self-contained Kaggle inference notebook
└── README.md                    # This file

🛠️ Setup

pip install torch torchvision timm albumentations pandas numpy scikit-learn scipy pillow

📋 Quick Start

1. Single Backbone Training

python train.py \
    --data_dir /path/to/competition/data \
    --output_dir ./output \
    --backbone dinov2_base \
    --epochs 30 \
    --batch_size 32 \
    --backbone_lr 3e-5 \
    --head_lr 1e-3 \
    --n_folds 5 \
    --aug_strength medium \
    --use_lds \
    --grad_checkpointing

2. Multi-Backbone Ensemble

python train_ensemble.py \
    --data_dir /path/to/competition/data \
    --output_dir ./ensemble_output \
    --backbones dinov2_base convnext_large \
    --epochs 30 \
    --n_folds 5

3. Inference

python inference.py \
    --data_dir /path/to/competition/data \
    --model_dir ./output \
    --output submission.csv \
    --n_tta 4

4. Kaggle Submission

Train: Run kaggle_train_notebook.py as a Kaggle GPU notebook
Save models: Download output and upload as a Kaggle dataset
Submit: Run kaggle_inference_notebook.py with models as input dataset

⚙️ Training Configuration

Recommended Settings

Setting	Value	Rationale
Backbone LR	3e-5	Differential LR (0.5× of head)
Head LR	1e-3	Fast head convergence
Weight Decay	1e-2	Standard for AdamW
Warmup Ratio	0.05	5% of training for LR warmup
Scheduler	Cosine	With warm restart
Batch Size	32	Effective 64 with grad_accum=2
Augmentations	Medium	D4 + color jitter + CoarseDropout
Log Transform	Yes	Normalizes skewed targets
LDS	Yes	Handles imbalanced distributions
Consistency Weight	0.1	Total ≈ Green + Dead + Clover
Early Stopping	8 epochs	Based on validation R²

Hyperparameter Sweep

Key hyperparameters to tune:

backbone_lr: [1e-5, 3e-5, 5e-5]
head_lr: [5e-4, 1e-3, 2e-3]
dropout: [0.2, 0.3, 0.4]
hidden_dim: [256, 512, 1024]
consistency_weight: [0.0, 0.05, 0.1, 0.2]
aug_strength: [light, medium, heavy]
img_size: [224, 384, 448]

📚 References

DINOv2: arXiv:2304.07193 — Self-supervised ViT features
Rank-N-Contrast: arXiv:2210.01189 — Contrastive learning for regression
LDS/FDS: arXiv:2102.09554 — Imbalanced regression (ICML 2021)
DINOvTree: arXiv:2603.23669 — DINOv2 for tree biomass
timm: docs — PyTorch Image Models

🏅 Expected Performance

Based on literature and OOF validation:

Configuration	Expected CV R²
DINOv2-Base (single)	0.55–0.70
ConvNeXt-Large (single)	0.50–0.65
DINOv2-Base + ConvNeXt-Large ensemble	0.60–0.75
DINOv2-Large + TTA	0.60–0.75
Full ensemble (3 backbones + TTA + LDS)	0.65–0.80

Note: Actual scores depend on data quality, image resolution, and distribution shift between train/test.

💡 Tips for Improvement

Higher resolution — Try 384×384 or 448×448 (more detail for biomass estimation)
Deeper heads — Try separate heads per target for specialization
NDVI features — If NDVI data is available, concatenate with image features
Pseudo-labeling — Train on test set pseudo-labels for domain adaptation
Multi-scale features — Use timm feature extraction at multiple scales
Stacking — Train a second-level model on OOF predictions from diverse backbones
Target engineering — Predict ratios (Green/Total, Dead/Total) as auxiliary targets

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for notRaphael/csiro-image2biomass