Vikhrmodels
/

the-well-diffusion

@@ -88,7 +88,7 @@ Trained on **turbulent_radiative_layer_2D** from [The Well](https://polymathic-a
 | GPU | NVIDIA RTX A6000 (48GB) |
 | Training time | ~7 hours |
-### Training Results
 | Metric | Value |
 |---|---|
@@ -98,6 +98,33 @@ Trained on **turbulent_radiative_layer_2D** from [The Well](https://polymathic-a
 Training loss curve, validation metrics, comparison images (Condition | Ground Truth | Prediction), and rollout videos (GT vs Prediction side-by-side) are all available on the [WandB run](https://wandb.ai/alexwortega/the-well-diffusion/runs/ilnm4eh9).
 ## Usage
 ### Installation
@@ -127,6 +154,24 @@ x_cond = ...  # your input frame
 x_pred = model.sample_ddim(x_cond, steps=50)  # fast DDIM sampling
 ```
 ### Autoregressive rollout
 ```python
@@ -174,7 +219,8 @@ python train_diffusion.py --streaming --batch_size 4
 | `train_jepa.py` | JEPA training with EMA schedule, VICReg metrics |
 | `eval_utils.py` | Evaluation: single-step MSE, rollout videos, WandB media logging |
 | `test_pipeline.py` | End-to-end verification script (data → forward → backward) |
-| `diffusion_ep0099.pt` | Final checkpoint (epoch 99, 748MB) |
 ## Evaluation Details

 | GPU | NVIDIA RTX A6000 (48GB) |
 | Training time | ~7 hours |
+### Diffusion Training Results
 | Metric | Value |
 |---|---|
 Training loss curve, validation metrics, comparison images (Condition | Ground Truth | Prediction), and rollout videos (GT vs Prediction side-by-side) are all available on the [WandB run](https://wandb.ai/alexwortega/the-well-diffusion/runs/ilnm4eh9).
+### JEPA Training Config
+| Parameter | Value |
+|---|---|
+| Optimizer | AdamW (lr=3e-4, wd=0.05) |
+| LR schedule | Cosine with 500-step warmup |
+| Batch size | 16 |
+| Mixed precision | bfloat16 |
+| Gradient clipping | max_norm=1.0 |
+| EMA schedule | Cosine 0.996 → 1.0 |
+| Epochs | 100 |
+| GPU | NVIDIA RTX A6000 (48GB) |
+| Training time | ~1.5 hours |
+### JEPA Training Results
+| Metric | Value |
+|---|---|
+| Final train loss | 4.07 |
+| Similarity (sim) | 0.079 |
+| Variance (VICReg) | 1.476 |
+| Covariance (VICReg) | 0.578 |
+Loss progression: 4.55 (epoch 0) → 3.79 (epoch 2) → 4.07 (epoch 99, converged ~epoch 50). The VICReg regularization keeps representations from collapsing while the similarity loss learns dynamics prediction.
+Full JEPA training metrics available on the [WandB run](https://wandb.ai/alexwortega/the-well-jepa/runs/obwyebcv).
 ## Usage
 ### Installation
 x_pred = model.sample_ddim(x_cond, steps=50)  # fast DDIM sampling
 ```
+### JEPA inference (extract dynamics embeddings)
+```python
+import torch
+from jepa import JEPA
+device = "cuda"
+model = JEPA(in_channels=4, latent_channels=128, base_ch=32, pred_hidden=256).to(device)
+ckpt = torch.load("jepa_ep0099.pt", map_location=device)
+model.load_state_dict(ckpt["model"])
+model.eval()
+# Given a frame [1, 4, 128, 384]:
+x = ...  # your input frame
+z = model.online_encoder(x)  # [1, 128, 16, 48] spatial latent map
+```
 ### Autoregressive rollout
 ```python
 | `train_jepa.py` | JEPA training with EMA schedule, VICReg metrics |
 | `eval_utils.py` | Evaluation: single-step MSE, rollout videos, WandB media logging |
 | `test_pipeline.py` | End-to-end verification script (data → forward → backward) |
+| `diffusion_ep0099.pt` | Diffusion final checkpoint (epoch 99, 748MB) |
+| `jepa_ep0099.pt` | JEPA final checkpoint (epoch 99, 23MB) |
 ## Evaluation Details