Vikhrmodels
/

the-well-diffusion

+---
+license: apache-2.0
+tags:
+  - physics
+  - diffusion
+  - jepa
+  - pde
+  - simulation
+  - the-well
+  - ddpm
+  - ddim
+  - spatiotemporal
+datasets:
+  - polymathic-ai/turbulent_radiative_layer_2D
+language:
+  - en
+pipeline_tag: image-to-image
+---
+# The Well: Diffusion & JEPA for PDE Dynamics
+Conditional diffusion model (DDPM/DDIM) and Spatial JEPA trained to predict the evolution of 2D physics simulations from [The Well](https://polymathic-ai.org/the_well/) dataset collection by Polymathic AI.
+Given the current state of a physical system (e.g. turbulent radiative layer), the model predicts the next time step. Can be run autoregressively to generate multi-step rollout trajectories.
+## Architecture
+### Conditional DDPM (62M parameters)
+| Component | Details |
+|---|---|
+| **Backbone** | U-Net with 4 resolution levels (64→128→256→512 channels) |
+| **Conditioning** | Previous frame concatenated to noisy target along channel dim |
+| **Time encoding** | Sinusoidal positional embedding → MLP (256-d) |
+| **Residual blocks** | GroupNorm → SiLU → Conv3x3 → +time_emb → GroupNorm → SiLU → Dropout → Conv3x3 |
+| **Attention** | Multi-head self-attention at bottleneck (16x48 spatial, 768 tokens) |
+| **Noise schedule** | Linear beta: 1e-4 → 0.02, 1000 timesteps |
+| **Parameterization** | Epsilon-prediction (predict noise) |
+| **Sampling** | DDPM (1000 steps) or DDIM (50 steps, deterministic) |
+```
+Input: [B, 8, 128, 384]  ← 4ch noisy target + 4ch condition
+  ↓ Conv3x3 → 64ch
+  ↓ Level 0: 2×ResBlock(64),   Downsample → 64×64×192
+  ↓ Level 1: 2×ResBlock(128),  Downsample → 128×32×96
+  ↓ Level 2: 2×ResBlock(256),  Downsample → 256×16×48
+  ↓ Level 3: 2×ResBlock(512) + SelfAttention
+  ↓ Middle:  ResBlock + Attention + ResBlock (512ch)
+  ↑ Level 3: 3×ResBlock(512) + Attention, Upsample
+  ↑ Level 2: 3×ResBlock(256), Upsample
+  ↑ Level 1: 3×ResBlock(128), Upsample
+  ↑ Level 0: 3×ResBlock(64)
+  ↓ GroupNorm → SiLU → Conv3x3
+Output: [B, 4, 128, 384]  ← predicted noise
+```
+### Spatial JEPA (1.8M trainable parameters)
+| Component | Details |
+|---|---|
+| **Online encoder** | ResNet-style CNN (3 stages, stride-2), outputs spatial latent maps [B, 128, H/8, W/8] |
+| **Target encoder** | EMA copy of online encoder (decay 0.996 → 1.0 cosine schedule) |
+| **Predictor** | 3-layer CNN on spatial feature maps (128 → 256 → 128 channels) |
+| **Loss** | Spatial MSE + VICReg regularization (variance + covariance on channel-averaged features) |
+The JEPA learns compressed dynamics representations without generating pixels, useful for downstream tasks and transfer learning.
+## Training
+### Dataset
+Trained on **turbulent_radiative_layer_2D** from [The Well](https://polymathic-ai.org/the_well/) (Polymathic AI, NeurIPS 2024 Datasets & Benchmarks):
+- 2D turbulent radiative layer simulation
+- Resolution: 128 × 384 spatial, 4 physical field channels
+- 90 trajectories × 101 timesteps = 7,200 training samples
+- 6.9 GB total (HDF5 format)
+### Diffusion Training Config
+| Parameter | Value |
+|---|---|
+| Optimizer | AdamW (lr=1e-4, wd=0.01) |
+| LR schedule | Cosine with 500-step warmup |
+| Batch size | 8 |
+| Mixed precision | bfloat16 |
+| Gradient clipping | max_norm=1.0 |
+| Epochs | 100 |
+| GPU | NVIDIA RTX A6000 (48GB) |
+| Training time | ~7 hours |
+### Training Results
+| Metric | Value |
+|---|---|
+| Final train loss | 0.028 |
+| Val MSE (single-step) | 743.3 |
+| Rollout MSE (10-step mean) | 805.1 |
+Training loss curve, validation metrics, comparison images (Condition | Ground Truth | Prediction), and rollout videos (GT vs Prediction side-by-side) are all available on the [WandB run](https://wandb.ai/alexwortega/the-well-diffusion/runs/ilnm4eh9).
+## Usage
+### Installation
+```bash
+pip install the_well torch einops wandb tqdm h5py matplotlib "wandb[media]"
+```
+### Inference (generate next frame)
+```python
+import torch
+from unet import UNet
+from diffusion import GaussianDiffusion
+# Load model
+device = "cuda"
+unet = UNet(in_channels=8, out_channels=4, base_ch=64, ch_mults=(1, 2, 4, 8))
+model = GaussianDiffusion(unet, timesteps=1000).to(device)
+ckpt = torch.load("diffusion_ep0099.pt", map_location=device)
+model.load_state_dict(ckpt["model"])
+model.eval()
+# Given a condition frame [1, 4, 128, 384]:
+x_cond = ...  # your input frame
+x_pred = model.sample_ddim(x_cond, steps=50)  # fast DDIM sampling
+```
+### Autoregressive rollout
+```python
+# Generate 20-step trajectory
+trajectory = [x_cond]
+cond = x_cond
+for step in range(20):
+    pred = model.sample_ddim(cond, steps=50, eta=0.0)
+    trajectory.append(pred)
+    cond = pred  # feed prediction back as next condition
+```
+### Training from scratch
+```bash
+# Download data locally (6.9 GB)
+the-well-download --base-path ./data --dataset turbulent_radiative_layer_2D
+# Train diffusion with WandB logging + eval videos
+python train_diffusion.py \
+  --no-streaming --local_path ./data/datasets \
+  --batch_size 8 --epochs 100 --wandb
+# Train JEPA
+python train_jepa.py \
+  --no-streaming --local_path ./data/datasets \
+  --batch_size 16 --epochs 100 --wandb
+```
+### Streaming from HuggingFace (no download needed)
+```bash
+python train_diffusion.py --streaming --batch_size 4
+```
+## Project Structure
+| File | Description |
+|---|---|
+| `unet.py` | U-Net with time conditioning, skip connections, self-attention |
+| `diffusion.py` | DDPM/DDIM framework: noise schedule, training loss, sampling |
+| `jepa.py` | Spatial JEPA: CNN encoder, conv predictor, EMA target, VICReg loss |
+| `data_pipeline.py` | Data loading from The Well (streaming HF or local HDF5) |
+| `train_diffusion.py` | Diffusion training with eval, video logging, checkpointing |
+| `train_jepa.py` | JEPA training with EMA schedule, VICReg metrics |
+| `eval_utils.py` | Evaluation: single-step MSE, rollout videos, WandB media logging |
+| `test_pipeline.py` | End-to-end verification script (data → forward → backward) |
+| `diffusion_ep0099.pt` | Final checkpoint (epoch 99, 748MB) |
+## Evaluation Details
+Every 5 epochs, the training script runs:
+1. **Single-step evaluation**: DDIM-50 sampling on 4 validation batches, MSE against ground truth
+2. **Multi-step rollout**: 10-step autoregressive prediction from a validation sample
+3. **Video logging**: Side-by-side GT vs Prediction video logged to WandB as mp4
+4. **Comparison images**: Condition | Ground Truth | Prediction for each field channel (RdBu_r colormap)
+5. **Rollout MSE curve**: Per-step MSE showing prediction degradation over horizon
+## The Well Dataset
+[The Well](https://polymathic-ai.org/the_well/) is a 15TB collection of 16 physics simulation datasets (NeurIPS 2024). This project works with any 2D dataset from The Well — just change `--dataset`:
+```bash
+python train_diffusion.py --dataset active_matter          # 51 GB, 256×256
+python train_diffusion.py --dataset shear_flow             # 115 GB, 128×256
+python train_diffusion.py --dataset gray_scott_reaction_diffusion  # 154 GB
+```
+## Citation
+```bibtex
+@inproceedings{thewell2024,
+  title={The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning},
+  author={Polymathic AI},
+  booktitle={NeurIPS 2024 Datasets and Benchmarks},
+  year={2024}
+}
+```
+## License
+Apache 2.0