athul020 commited on
Commit
ee03de9
·
verified ·
1 Parent(s): f6bf639

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +106 -3
README.md CHANGED
@@ -1,3 +1,106 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - text-to-video
7
+ - lora
8
+ - physics
9
+ - cogvideox
10
+ - diffusion
11
+ - peft
12
+ - warp
13
+ - rigid-body
14
+ - fine-tuned
15
+ base_model: THUDM/CogVideoX-2b
16
+ pipeline_tag: text-to-video
17
+ library_name: diffusers
18
+ ---
19
+
20
+ # PhysicsDrivenWorld (PDW)
21
+ ### Physics-Corrected Video Generation via Warp-Guided LoRA Fine-Tuning
22
+
23
+ > **CogVideoX-2b + LoRA (r=16) · NVIDIA Warp Physics · Single H100 NVL**
24
+
25
+ ---
26
+
27
+ ## Key Result
28
+
29
+ | Metric | Base CogVideoX-2b | PDW (Ours) | Improvement |
30
+ |--------|:-----------------:|:----------:|:-----------:|
31
+ | Diffusion MSE — test_medium | 2.2676 | 0.3861 | **+83.0%** |
32
+ | Diffusion MSE — test_very_high | 2.2763 | 0.3790 | **+83.4%** |
33
+ | **Average** | **2.272** | **0.383** | **+83.2%** |
34
+
35
+ The fine-tuned model predicts noise on physics-correct reference frames **83.2% more accurately** than the base model, confirming that the Warp physics prior was successfully injected into the denoising weights.
36
+
37
+ ---
38
+
39
+ ## Model Description
40
+
41
+ **PhysicsDrivenWorld (PDW)** fine-tunes [CogVideoX-2b](https://huggingface.co/THUDM/CogVideoX-2b) using **Low-Rank Adaptation (LoRA)** supervised by an **NVIDIA Warp** rigid-body physics simulator.
42
+
43
+ Modern video diffusion models generate visually plausible but physically inconsistent results — objects float, bounce unrealistically, or violate Newton's laws. PDW injects a physics prior into the model's denoising weights by training on Warp-simulated ground-truth trajectories.
44
+
45
+ The training objective is standard **diffusion denoising MSE**, but applied exclusively to frames that are **physically correct by construction** from the Warp simulator — so the model learns to denoise physics-consistent content better than physics-inconsistent content.
46
+
47
+ ---
48
+
49
+ ## Architecture
50
+
51
+ | Component | Details |
52
+ |-----------|---------|
53
+ | **Base Model** | CogVideoX-2b (2B parameter text-to-video diffusion transformer) |
54
+ | **Adapter** | LoRA — rank r=16, alpha=32 |
55
+ | **Target Modules** | `to_q`, `to_k`, `to_v`, `to_out.0` (attention projections) |
56
+ | **Trainable Params** | ~3.7M of 2B total (0.185%) |
57
+ | **Physics Engine** | NVIDIA Warp 1.11.1 — GPU-accelerated rigid body simulator |
58
+ | **Simulation** | Semi-implicit Euler, 60 Hz, ground collision with restitution |
59
+ | **Training Loss** | Diffusion MSE on Warp-generated physics-correct frames |
60
+ | **LR Schedule** | 10-step linear warmup (1e-6 → 1e-4) then cosine decay to 1e-6 |
61
+ | **Hardware** | Single NVIDIA H100 NVL (99.9 GB VRAM) — 13.9 GB peak usage |
62
+
63
+ ---
64
+
65
+ ## Training
66
+
67
+ ### Hyperparameters
68
+
69
+ | Hyperparameter | Value |
70
+ |---------------|-------|
71
+ | LoRA rank (r) | 16 |
72
+ | LoRA alpha | 32 |
73
+ | LoRA dropout | 0.05 |
74
+ | Peak learning rate | 1e-4 |
75
+ | Optimiser | AdamW (β=(0.9, 0.999), ε=1e-8, weight_decay=0.01) |
76
+ | Training steps | 200 (5 epochs × 40 steps) |
77
+ | Batch size | 1 |
78
+ | Diffusion timesteps | DDPMScheduler (1000 steps), random t ∈ [50, 950] |
79
+ | Precision | bfloat16 |
80
+ | Gradient clipping | 1.0 |
81
+
82
+ ### Training Data — Warp Physics Scenarios
83
+
84
+ Training uses **synthetic videos rendered from NVIDIA Warp rigid-body simulations**, not real-world video. This eliminates dataset bias and provides ground-truth physically-correct trajectories as supervision.
85
+
86
+ | Scenario | Drop Height | Restitution | Physics Behaviour |
87
+ |----------|:-----------:|:-----------:|-------------------|
88
+ | ball_drop_low | 2m | 0.70 | Low-energy drop, high bounce |
89
+ | ball_drop_high | 5m | 0.60 | Standard gravity, moderate bounce |
90
+ | ball_elastic | 3m | 0.85 | Very elastic — multiple high bounces |
91
+ | ball_heavy | 4m | 0.30 | Inelastic — dead stop after first bounce |
92
+
93
+ ### Convergence
94
+
95
+ | Epoch | Avg Loss | Notes |
96
+ |-------|----------|-------|
97
+ | 1 | 1.512 | Warmup spike — expected |
98
+ | 2 | ~0.45 | Fast learning |
99
+ | 5 | **0.341** | Converged — 77% drop from epoch 1 |
100
+
101
+ ---
102
+
103
+ ## How to Use
104
+
105
+ ### Load the Model
106
+ ```python