forestWHY JEPA — Sentinel-2 I-JEPA ViT-L/8 encoder

A 302 M parameter Vision Transformer (ViT-Large/8) pretrained with I-JEPA on global Sentinel-2 imagery. Used by forestWHY to produce six differential attention/embedding panels from a temporal pair of 13-band Sentinel-2 tiles.

Architecture

Field	Value
Backbone	ViT-Large/8
Embed dim	1024
Depth	24 transformer blocks
Heads	16
Patch size	8 × 8
Input	13-band Sentinel-2, 64 × 64
Pretraining	I-JEPA (no contrastive heads)

Input bands (channel order): B1, B2, B3, B4, B5, B6, B7, B8, B8A, B9, B10, B11, B12 (Sentinel-Hub naming).

Training data

Two-stage pretraining:

BigEarthNet — 590 K labelled Sentinel-2 patches, used for the initial I-JEPA warm-up.
Google Earth Engine — ~ 330 K locations × 8 years (2017–2024) = 1.57 M temporal Sentinel-2 patches, sampled to over-represent forest-loss pixels using Hansen Global Forest Change as a prior.

Files

s2_ijepa_gee_vitl_full_encoder_final.pt — encoder-only checkpoint (1.1 GB). Loadable via the S2Encoder class in forestwhy/cookbook/src/forestwhy/jepa.py.

Usage

from huggingface_hub import hf_hub_download
import torch

# Drop the S2Encoder class from forestwhy.jepa into your project, then:
from forestwhy.jepa import S2Encoder

ckpt_path = hf_hub_download(
    repo_id="Siddharth63/forestWHY-JEPA-vitl",
    filename="s2_ijepa_gee_vitl_full_encoder_final.pt",
)
ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)
encoder = S2Encoder(embed_dim=1024, depth=24, num_heads=16)
encoder.load_state_dict(ckpt["encoder"])
encoder.eval()

For the full panel-generation pipeline (six differential panels per temporal pair), see forestwhy.jepa.make_jepa_panels.

License

MIT. The training data is © Copernicus / ESA (Sentinel-2) and the European Space Agency under the Copernicus open licence.

Citation

Built for the Liquid AI hackathon LFM track.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for Siddharth63/forestWHY-JEPA-vitl

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

Paper • 2301.08243 • Published Jan 19, 2023 • 7