forestWHY JEPA β€” Sentinel-2 I-JEPA ViT-L/8 encoder

A 302 M parameter Vision Transformer (ViT-Large/8) pretrained with I-JEPA on global Sentinel-2 imagery. Used by forestWHY to produce six differential attention/embedding panels from a temporal pair of 13-band Sentinel-2 tiles.

Architecture

Field Value
Backbone ViT-Large/8
Embed dim 1024
Depth 24 transformer blocks
Heads 16
Patch size 8 Γ— 8
Input 13-band Sentinel-2, 64 Γ— 64
Pretraining I-JEPA (no contrastive heads)

Input bands (channel order): B1, B2, B3, B4, B5, B6, B7, B8, B8A, B9, B10, B11, B12 (Sentinel-Hub naming).

Training data

Two-stage pretraining:

  1. BigEarthNet β€” 590 K labelled Sentinel-2 patches, used for the initial I-JEPA warm-up.
  2. Google Earth Engine β€” ~ 330 K locations Γ— 8 years (2017–2024) = 1.57 M temporal Sentinel-2 patches, sampled to over-represent forest-loss pixels using Hansen Global Forest Change as a prior.

Files

Usage

from huggingface_hub import hf_hub_download
import torch

# Drop the S2Encoder class from forestwhy.jepa into your project, then:
from forestwhy.jepa import S2Encoder

ckpt_path = hf_hub_download(
    repo_id="Siddharth63/forestWHY-JEPA-vitl",
    filename="s2_ijepa_gee_vitl_full_encoder_final.pt",
)
ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)
encoder = S2Encoder(embed_dim=1024, depth=24, num_heads=16)
encoder.load_state_dict(ckpt["encoder"])
encoder.eval()

For the full panel-generation pipeline (six differential panels per temporal pair), see forestwhy.jepa.make_jepa_panels.

License

MIT. The training data is Β© Copernicus / ESA (Sentinel-2) and the European Space Agency under the Copernicus open licence.

Citation

Built for the Liquid AI hackathon LFM track.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for Siddharth63/forestWHY-JEPA-vitl