track4d_360 / README.md
yslan's picture
Upload README.md with huggingface_hub
680d9ef verified
# Track4D-360 camera-control checkpoint backup β€” 2026-04-23 / extended 2026-04-28
Tagged copies of every benchmarked Track4D-360 camera-control checkpoint.
The 1.3B family (4 files, ~2.4 GB each) was the original
2026-04-23 backup for the CLIP-F/V benchmark
(`doc/track4d-360/2026-04-23-clipfv-4models-288x512-benchmark.md`); the 14B
savefix step-2000 file was added 2026-04-28 once the multi-node
FSDP-savefix run (`doc/track4d-360/bugs/2026-04-25-multinode-training-desync-fixes.md`)
produced its first verified-correct checkpoint.
Filenames are renamed to carry the variant tag so the source of truth is
legible without walking back through the training dirs.
**Upload destination:** all files in this directory get pushed to
[`yslan/track4d_360`](https://huggingface.co/yslan/track4d_360/tree/main)
on HuggingFace, preserving the exact filename. Filename = HF file path.
| backup file | size | source | training @ |
|---|---|---|---|
| `warped_step-13000.safetensors` | 2.2 GB | `warped_appearance_concat_proj_mixed_real_synth_144x256x49_1p3b_2gpu/train/Wan2.1-T2V-1.3B_track4d360_warped_appearance_concat_proj_mixed_real_synth/step-13000.safetensors` | 144x256 (Lyra-2 latent-fuse new architecture) |
| `static13k_step-13500.safetensors` | 2.4 GB | `hybrid_dense_plucker_mixed_real_synth_concat_project_trainplucker_attnffn_cond_dropout_144x256x49_1p3b_2gpu/train/.../step-13500.safetensors` | 144x256 (old-arch, static-only, no syn4d) |
| `dynamic5k_step-5000.safetensors` | 2.4 GB | `hybrid_dense_plucker_mixed_real_synth_syn4d_concat_project_trainplucker_attnffn_cond_dropout_144x256x49_1p3b_2gpu/train/.../step-5000.safetensors` | 144x256 (old-arch, + syn4d dynamic) |
| `ismb288_3k_step-3000.safetensors` | 2.4 GB | `ismb_hybrid_dense_plucker_mixed_real_synth_syn4d_recam_syncam_cond_dropout_288x512x49_1p3b_16gpu/train/.../step-3000.safetensors` | **288x512 (native)** (old-arch, + syn4d + RecamMaster + SynCamMaster, Isambard-trained, synced 2026-04-23) |
| `14b_savefix_step-2000.safetensors` | **22.9 GB** | `ismb_hybrid_dense_plucker_mixed_real_synth_syn4d_recam_syncam_savefix_cond_dropout_144x256x49_14b_16gpu_fsdp/train/Wan2.1-Fun-14B_track4d360_..._savefix_cond_dropout/step-2000.safetensors` | **144x256 14B** (Wan-Fun 14B FSDP, post-savefix bug fix, 4-node Isambard, copied 2026-04-28). Train launcher: [`bash_scripts/track4d_360/ismb/14b/sbatch/ismb_sbatch_14b_4node_144x256_fsdp_noise_commtuned_savefix_resume8800.sh`](/scratch/shared/beegfs/yushi/Repo/geo4d_360/VideoX-Fun/bash_scripts/track4d_360/ismb/14b/sbatch/ismb_sbatch_14b_4node_144x256_fsdp_noise_commtuned_savefix_resume8800.sh) |
Size delta within the 1.3B family: `warped` is 2.2 GB vs 2.4 GB for the
others. This is the architectural difference β€” `warped` has
`dense_in_channels=2` (geometry-only dense) vs `5` (RGB+geom) for the
old-arch models, and a different `track_injection_mode` with different
trainable subgraphs.
The 14B file's 22.9 GB is bf16 weights for the full 14B Wan-Fun DiT plus
the trainable Track4D-360 adapter modules (track_adapter, track_block_injector,
dense_target_control_encoder, plucker control_adapter) β€” see the
`[Eval] DiT checkpoint load` line printed by the eval script for the
exact key inventory.
Backup method: `cp` (1.3B family was rsync 2026-04-23; 14B was a single
`cp` 2026-04-28). Verified by byte-size match against sources.
---
## Reproducibility bundles (dataset subsets, for sharing)
Some benchmarks need a tiny slice of the full dataset roots. Those bundles
live alongside the checkpoints so the whole "checkpoints + data + scripts"
tree can be tarred together for a collaborator.
| bundle dir | size | bench it reproduces |
|---|---|---|
| `bench_ismb288_multiframe_repro/` | ~52 GB | `benchmark_ismb288_3k_multiframe_vs_zbuffer_288x512.sh` (5 datasets Γ— 5 scenes Γ— 10 trajectories @ 288Γ—512). See [`bench_ismb288_multiframe_repro/README.md`](bench_ismb288_multiframe_repro/README.md) for layout, run instructions, and what's deliberately NOT included (Wan base + VXF source). Generated by [`prepare_repro_data_ismb288_multiframe.sh`](/scratch/shared/beegfs/yushi/Repo/geo4d_360/VideoX-Fun/bash_scripts/track4d_360/plucker/benchmark/prepare_repro_data_ismb288_multiframe.sh). |
To create an archive for upload (data + scripts only β€” checkpoint is already
on HF as `yslan/track4d_360/ismb288_3k_step-3000.safetensors`, no need to
re-bundle it). The bundle is mostly PNG/EXR/safetensors β€” already compressed
content, so gzip is slow and gains almost nothing. Recommended: plain `.tar`.
```bash
cd /scratch/shared/beegfs/yushi/logs/track4d-360/backup
# Recommended β€” plain tar, fast (just streams bytes; PNG/EXR don't compress):
tar -cf bench_ismb288_multiframe_repro.tar bench_ismb288_multiframe_repro
# Alternative if you prefer .tar.gz format β€” use parallel gzip:
# tar -c bench_ismb288_multiframe_repro | pigz -p $(nproc) > bench_ismb288_multiframe_repro.tar.gz
# Alternative β€” tar + zstd (best size/speed for HF if both sides have zstd):
# tar --use-compress-program='zstd -T0 -3' -cf bench_ismb288_multiframe_repro.tar.zst bench_ismb288_multiframe_repro
# Avoid: tar -czf ... β€” single-threaded gzip on 52 GB, ~hours, near-zero gain.
# After verifying the archive is good (and ideally after uploading to HF),
# the unzipped tree is redundant β€” drop it to reclaim 52 GB:
rm -rf bench_ismb288_multiframe_repro
# Re-creating later is cheap (rsync -a, ~52 GB read from beegfs sources):
bash /scratch/shared/beegfs/yushi/Repo/geo4d_360/VideoX-Fun/bash_scripts/track4d_360/plucker/benchmark/prepare_repro_data_ismb288_multiframe.sh
```
---
## How to eval
All 4 checkpoints evaluate through the SAME master benchmark script
(under `VideoX-Fun/` repo root). It takes care of per-variant arg dispatch
so you don't have to think about the flags listed below.
### Master benchmark (all 4 models, 5 datasets Γ— 5 scenes Γ— 10 trajectories @ 288x512x49)
```bash
cd /scratch/shared/beegfs/yushi/Repo/geo4d_360/VideoX-Fun
bash bash_scripts/track4d_360/plucker/benchmark_clipfv_4models_288x512_2gpu.sh
```
Options:
```bash
# one variant only:
MODEL=warped bash bash_scripts/track4d_360/plucker/benchmark_clipfv_4models_288x512_2gpu.sh
MODEL=static13k bash ...
MODEL=dynamic5k bash ...
MODEL=ismb288_3k bash ...
# custom GPU pair (defaults GPU0=0 GPU1=1):
GPU0=4 GPU1=5 bash ...
# point at backup ckpts instead of the live train dirs (example override):
CKPT_WARPED=/scratch/shared/beegfs/yushi/logs/track4d-360/backup/warped_step-13000.safetensors \
bash ...
```
The script writes to
`/scratch/shared/beegfs/yushi/logs/track4d-360/benchmark/clipfv_4models_288x512_2gpu/`,
with `summary.md` aggregated by
`python -m track4d_360.tools.aggregate_clip_benchmark`.
Already-completed trajectories auto-skip on re-run (`pred_rgb.mp4` existence
check in each novel-traj script).
### Single-dataset / ad-hoc invocations
If you just want to run one checkpoint against one dataset, the benchmark script
dispatches to these three eval entrypoints (all under `examples/wan2.1_fun/`):
| dataset | script |
|---|---|
| mvs_synth, dl3dv, re10k | `eval_track4d360_hybrid_dense_static_scene_novel_traj.py` |
| kubric | `eval_track4d360_hybrid_dense_kubric_novel_traj.py` |
| syn4d | `eval_track4d360_hybrid_dense_syn4d_novel_traj.py` |
All three use `track4d_360.shared_args` as of 2026-04-23 β€” so they accept the
full warped + plucker + dense + track flag set. **Exact per-variant flags are
the table below β€” do not forget them: `build_eval_pipeline` reads
`warped_condition_mode` via `getattr(..., "off")`, so an omitted flag on a
warped checkpoint silently runs the model in the wrong architecture.**
### Per-variant eval-flag recipe
Must match training β€” silent mismatches are the #1 source of wrong results.
See `doc/track4d-360/2026-04-23-clipfv-4models-288x512-benchmark.md` Β§2 bug log.
```
warped_step-13000.safetensors:
--track_injection_mode single
--warped_condition_mode latent_fuse
--warped_appearance_fusion concat_proj
--warped_geom_only_dense
--dense_in_channels 2
static13k_step-13500.safetensors
dynamic5k_step-5000.safetensors
ismb288_3k_step-3000.safetensors:
--track_injection_mode per_block
--track_injection_block_mode concat_project
--warped_condition_mode off
--dense_in_channels 5
```
Shared across all 4:
```
--use_plucker_camera_control
--enable_v2v_plucker_camera_control
--use_query_frame_impulse_condition
--use_dense_branch
--dense_proj_dim 32
--dense_num_residual_blocks 2
--dense_alpha_track 1.0
--track_config config/track4d_360/default_conv3d_patchify_srcdepth.yaml
--num_inference_steps 50
--cfg_scale 1.0
--sigma_shift 5.0
--seed 42
```
And the base-DiT init path is always `weights/wan21-1p3b/diffusion_pytorch_model.safetensors`
via `--vxf_init_checkpoint` (CLAUDE.md load-order Invariant A/B β€” VXF init must
run BEFORE LoRA wrap and is required on both scratch and resume paths).