track4d_360 / README.md
yslan's picture
Upload README.md with huggingface_hub
680d9ef verified

Track4D-360 camera-control checkpoint backup β€” 2026-04-23 / extended 2026-04-28

Tagged copies of every benchmarked Track4D-360 camera-control checkpoint. The 1.3B family (4 files, ~2.4 GB each) was the original 2026-04-23 backup for the CLIP-F/V benchmark (doc/track4d-360/2026-04-23-clipfv-4models-288x512-benchmark.md); the 14B savefix step-2000 file was added 2026-04-28 once the multi-node FSDP-savefix run (doc/track4d-360/bugs/2026-04-25-multinode-training-desync-fixes.md) produced its first verified-correct checkpoint.

Filenames are renamed to carry the variant tag so the source of truth is legible without walking back through the training dirs.

Upload destination: all files in this directory get pushed to yslan/track4d_360 on HuggingFace, preserving the exact filename. Filename = HF file path.

backup file size source training @
warped_step-13000.safetensors 2.2 GB warped_appearance_concat_proj_mixed_real_synth_144x256x49_1p3b_2gpu/train/Wan2.1-T2V-1.3B_track4d360_warped_appearance_concat_proj_mixed_real_synth/step-13000.safetensors 144x256 (Lyra-2 latent-fuse new architecture)
static13k_step-13500.safetensors 2.4 GB hybrid_dense_plucker_mixed_real_synth_concat_project_trainplucker_attnffn_cond_dropout_144x256x49_1p3b_2gpu/train/.../step-13500.safetensors 144x256 (old-arch, static-only, no syn4d)
dynamic5k_step-5000.safetensors 2.4 GB hybrid_dense_plucker_mixed_real_synth_syn4d_concat_project_trainplucker_attnffn_cond_dropout_144x256x49_1p3b_2gpu/train/.../step-5000.safetensors 144x256 (old-arch, + syn4d dynamic)
ismb288_3k_step-3000.safetensors 2.4 GB ismb_hybrid_dense_plucker_mixed_real_synth_syn4d_recam_syncam_cond_dropout_288x512x49_1p3b_16gpu/train/.../step-3000.safetensors 288x512 (native) (old-arch, + syn4d + RecamMaster + SynCamMaster, Isambard-trained, synced 2026-04-23)
14b_savefix_step-2000.safetensors 22.9 GB ismb_hybrid_dense_plucker_mixed_real_synth_syn4d_recam_syncam_savefix_cond_dropout_144x256x49_14b_16gpu_fsdp/train/Wan2.1-Fun-14B_track4d360_..._savefix_cond_dropout/step-2000.safetensors 144x256 14B (Wan-Fun 14B FSDP, post-savefix bug fix, 4-node Isambard, copied 2026-04-28). Train launcher: bash_scripts/track4d_360/ismb/14b/sbatch/ismb_sbatch_14b_4node_144x256_fsdp_noise_commtuned_savefix_resume8800.sh

Size delta within the 1.3B family: warped is 2.2 GB vs 2.4 GB for the others. This is the architectural difference β€” warped has dense_in_channels=2 (geometry-only dense) vs 5 (RGB+geom) for the old-arch models, and a different track_injection_mode with different trainable subgraphs.

The 14B file's 22.9 GB is bf16 weights for the full 14B Wan-Fun DiT plus the trainable Track4D-360 adapter modules (track_adapter, track_block_injector, dense_target_control_encoder, plucker control_adapter) β€” see the [Eval] DiT checkpoint load line printed by the eval script for the exact key inventory.

Backup method: cp (1.3B family was rsync 2026-04-23; 14B was a single cp 2026-04-28). Verified by byte-size match against sources.


Reproducibility bundles (dataset subsets, for sharing)

Some benchmarks need a tiny slice of the full dataset roots. Those bundles live alongside the checkpoints so the whole "checkpoints + data + scripts" tree can be tarred together for a collaborator.

bundle dir size bench it reproduces
bench_ismb288_multiframe_repro/ ~52 GB benchmark_ismb288_3k_multiframe_vs_zbuffer_288x512.sh (5 datasets Γ— 5 scenes Γ— 10 trajectories @ 288Γ—512). See bench_ismb288_multiframe_repro/README.md for layout, run instructions, and what's deliberately NOT included (Wan base + VXF source). Generated by prepare_repro_data_ismb288_multiframe.sh.

To create an archive for upload (data + scripts only β€” checkpoint is already on HF as yslan/track4d_360/ismb288_3k_step-3000.safetensors, no need to re-bundle it). The bundle is mostly PNG/EXR/safetensors β€” already compressed content, so gzip is slow and gains almost nothing. Recommended: plain .tar.

cd /scratch/shared/beegfs/yushi/logs/track4d-360/backup

# Recommended β€” plain tar, fast (just streams bytes; PNG/EXR don't compress):
tar -cf bench_ismb288_multiframe_repro.tar bench_ismb288_multiframe_repro

# Alternative if you prefer .tar.gz format β€” use parallel gzip:
# tar -c bench_ismb288_multiframe_repro | pigz -p $(nproc) > bench_ismb288_multiframe_repro.tar.gz

# Alternative β€” tar + zstd (best size/speed for HF if both sides have zstd):
# tar --use-compress-program='zstd -T0 -3' -cf bench_ismb288_multiframe_repro.tar.zst bench_ismb288_multiframe_repro

# Avoid: tar -czf ... β€” single-threaded gzip on 52 GB, ~hours, near-zero gain.

# After verifying the archive is good (and ideally after uploading to HF),
# the unzipped tree is redundant β€” drop it to reclaim 52 GB:
rm -rf bench_ismb288_multiframe_repro

# Re-creating later is cheap (rsync -a, ~52 GB read from beegfs sources):
bash /scratch/shared/beegfs/yushi/Repo/geo4d_360/VideoX-Fun/bash_scripts/track4d_360/plucker/benchmark/prepare_repro_data_ismb288_multiframe.sh

How to eval

All 4 checkpoints evaluate through the SAME master benchmark script (under VideoX-Fun/ repo root). It takes care of per-variant arg dispatch so you don't have to think about the flags listed below.

Master benchmark (all 4 models, 5 datasets Γ— 5 scenes Γ— 10 trajectories @ 288x512x49)

cd /scratch/shared/beegfs/yushi/Repo/geo4d_360/VideoX-Fun
bash bash_scripts/track4d_360/plucker/benchmark_clipfv_4models_288x512_2gpu.sh

Options:

# one variant only:
MODEL=warped      bash bash_scripts/track4d_360/plucker/benchmark_clipfv_4models_288x512_2gpu.sh
MODEL=static13k   bash ...
MODEL=dynamic5k   bash ...
MODEL=ismb288_3k  bash ...

# custom GPU pair (defaults GPU0=0 GPU1=1):
GPU0=4 GPU1=5 bash ...

# point at backup ckpts instead of the live train dirs (example override):
CKPT_WARPED=/scratch/shared/beegfs/yushi/logs/track4d-360/backup/warped_step-13000.safetensors \
  bash ...

The script writes to /scratch/shared/beegfs/yushi/logs/track4d-360/benchmark/clipfv_4models_288x512_2gpu/, with summary.md aggregated by python -m track4d_360.tools.aggregate_clip_benchmark.

Already-completed trajectories auto-skip on re-run (pred_rgb.mp4 existence check in each novel-traj script).

Single-dataset / ad-hoc invocations

If you just want to run one checkpoint against one dataset, the benchmark script dispatches to these three eval entrypoints (all under examples/wan2.1_fun/):

dataset script
mvs_synth, dl3dv, re10k eval_track4d360_hybrid_dense_static_scene_novel_traj.py
kubric eval_track4d360_hybrid_dense_kubric_novel_traj.py
syn4d eval_track4d360_hybrid_dense_syn4d_novel_traj.py

All three use track4d_360.shared_args as of 2026-04-23 β€” so they accept the full warped + plucker + dense + track flag set. Exact per-variant flags are the table below β€” do not forget them: build_eval_pipeline reads warped_condition_mode via getattr(..., "off"), so an omitted flag on a warped checkpoint silently runs the model in the wrong architecture.

Per-variant eval-flag recipe

Must match training β€” silent mismatches are the #1 source of wrong results. See doc/track4d-360/2026-04-23-clipfv-4models-288x512-benchmark.md Β§2 bug log.

warped_step-13000.safetensors:
    --track_injection_mode single
    --warped_condition_mode latent_fuse
    --warped_appearance_fusion concat_proj
    --warped_geom_only_dense
    --dense_in_channels 2

static13k_step-13500.safetensors
dynamic5k_step-5000.safetensors
ismb288_3k_step-3000.safetensors:
    --track_injection_mode per_block
    --track_injection_block_mode concat_project
    --warped_condition_mode off
    --dense_in_channels 5

Shared across all 4:

--use_plucker_camera_control
--enable_v2v_plucker_camera_control
--use_query_frame_impulse_condition
--use_dense_branch
--dense_proj_dim 32
--dense_num_residual_blocks 2
--dense_alpha_track 1.0
--track_config config/track4d_360/default_conv3d_patchify_srcdepth.yaml
--num_inference_steps 50
--cfg_scale 1.0
--sigma_shift 5.0
--seed 42

And the base-DiT init path is always weights/wan21-1p3b/diffusion_pytorch_model.safetensors via --vxf_init_checkpoint (CLAUDE.md load-order Invariant A/B β€” VXF init must run BEFORE LoRA wrap and is required on both scratch and resume paths).