Upload README.md with huggingface_hub

680d9ef verified 18 days ago

8.94 kB

	# Track4D-360 camera-control checkpoint backup — 2026-04-23 / extended 2026-04-28

	Tagged copies of every benchmarked Track4D-360 camera-control checkpoint.
	The 1.3B family (4 files, ~2.4 GB each) was the original
	2026-04-23 backup for the CLIP-F/V benchmark
	(`doc/track4d-360/2026-04-23-clipfv-4models-288x512-benchmark.md`); the 14B
	savefix step-2000 file was added 2026-04-28 once the multi-node
	FSDP-savefix run (`doc/track4d-360/bugs/2026-04-25-multinode-training-desync-fixes.md`)
	produced its first verified-correct checkpoint.

	Filenames are renamed to carry the variant tag so the source of truth is
	legible without walking back through the training dirs.

	Upload destination: all files in this directory get pushed to
	[`yslan/track4d_360`](https://huggingface.co/yslan/track4d_360/tree/main)
	on HuggingFace, preserving the exact filename. Filename = HF file path.

	\| backup file \| size \| source \| training @ \|
	\|---\|---\|---\|---\|
	\| `warped_step-13000.safetensors` \| 2.2 GB \| `warped_appearance_concat_proj_mixed_real_synth_144x256x49_1p3b_2gpu/train/Wan2.1-T2V-1.3B_track4d360_warped_appearance_concat_proj_mixed_real_synth/step-13000.safetensors` \| 144x256 (Lyra-2 latent-fuse new architecture) \|
	\| `static13k_step-13500.safetensors` \| 2.4 GB \| `hybrid_dense_plucker_mixed_real_synth_concat_project_trainplucker_attnffn_cond_dropout_144x256x49_1p3b_2gpu/train/.../step-13500.safetensors` \| 144x256 (old-arch, static-only, no syn4d) \|
	\| `dynamic5k_step-5000.safetensors` \| 2.4 GB \| `hybrid_dense_plucker_mixed_real_synth_syn4d_concat_project_trainplucker_attnffn_cond_dropout_144x256x49_1p3b_2gpu/train/.../step-5000.safetensors` \| 144x256 (old-arch, + syn4d dynamic) \|
	\| `ismb288_3k_step-3000.safetensors` \| 2.4 GB \| `ismb_hybrid_dense_plucker_mixed_real_synth_syn4d_recam_syncam_cond_dropout_288x512x49_1p3b_16gpu/train/.../step-3000.safetensors` \| 288x512 (native) (old-arch, + syn4d + RecamMaster + SynCamMaster, Isambard-trained, synced 2026-04-23) \|
	\| `14b_savefix_step-2000.safetensors` \| 22.9 GB \| `ismb_hybrid_dense_plucker_mixed_real_synth_syn4d_recam_syncam_savefix_cond_dropout_144x256x49_14b_16gpu_fsdp/train/Wan2.1-Fun-14B_track4d360_..._savefix_cond_dropout/step-2000.safetensors` \| 144x256 14B (Wan-Fun 14B FSDP, post-savefix bug fix, 4-node Isambard, copied 2026-04-28). Train launcher: [`bash_scripts/track4d_360/ismb/14b/sbatch/ismb_sbatch_14b_4node_144x256_fsdp_noise_commtuned_savefix_resume8800.sh`](/scratch/shared/beegfs/yushi/Repo/geo4d_360/VideoX-Fun/bash_scripts/track4d_360/ismb/14b/sbatch/ismb_sbatch_14b_4node_144x256_fsdp_noise_commtuned_savefix_resume8800.sh) \|

	Size delta within the 1.3B family: `warped` is 2.2 GB vs 2.4 GB for the
	others. This is the architectural difference — `warped` has
	`dense_in_channels=2` (geometry-only dense) vs `5` (RGB+geom) for the
	old-arch models, and a different `track_injection_mode` with different
	trainable subgraphs.

	The 14B file's 22.9 GB is bf16 weights for the full 14B Wan-Fun DiT plus
	the trainable Track4D-360 adapter modules (track_adapter, track_block_injector,
	dense_target_control_encoder, plucker control_adapter) — see the
	`[Eval] DiT checkpoint load` line printed by the eval script for the
	exact key inventory.

	Backup method: `cp` (1.3B family was rsync 2026-04-23; 14B was a single
	`cp` 2026-04-28). Verified by byte-size match against sources.

	---

	## Reproducibility bundles (dataset subsets, for sharing)

	Some benchmarks need a tiny slice of the full dataset roots. Those bundles
	live alongside the checkpoints so the whole "checkpoints + data + scripts"
	tree can be tarred together for a collaborator.

	\| bundle dir \| size \| bench it reproduces \|
	\|---\|---\|---\|
	\| `bench_ismb288_multiframe_repro/` \| ~52 GB \| `benchmark_ismb288_3k_multiframe_vs_zbuffer_288x512.sh` (5 datasets × 5 scenes × 10 trajectories @ 288×512). See [`bench_ismb288_multiframe_repro/README.md`](bench_ismb288_multiframe_repro/README.md) for layout, run instructions, and what's deliberately NOT included (Wan base + VXF source). Generated by [`prepare_repro_data_ismb288_multiframe.sh`](/scratch/shared/beegfs/yushi/Repo/geo4d_360/VideoX-Fun/bash_scripts/track4d_360/plucker/benchmark/prepare_repro_data_ismb288_multiframe.sh). \|

	To create an archive for upload (data + scripts only — checkpoint is already
	on HF as `yslan/track4d_360/ismb288_3k_step-3000.safetensors`, no need to
	re-bundle it). The bundle is mostly PNG/EXR/safetensors — already compressed
	content, so gzip is slow and gains almost nothing. Recommended: plain `.tar`.

	```bash
	cd /scratch/shared/beegfs/yushi/logs/track4d-360/backup

	# Recommended — plain tar, fast (just streams bytes; PNG/EXR don't compress):
	tar -cf bench_ismb288_multiframe_repro.tar bench_ismb288_multiframe_repro

	# Alternative if you prefer .tar.gz format — use parallel gzip:
	# tar -c bench_ismb288_multiframe_repro \| pigz -p $(nproc) > bench_ismb288_multiframe_repro.tar.gz

	# Alternative — tar + zstd (best size/speed for HF if both sides have zstd):
	# tar --use-compress-program='zstd -T0 -3' -cf bench_ismb288_multiframe_repro.tar.zst bench_ismb288_multiframe_repro

	# Avoid: tar -czf ... — single-threaded gzip on 52 GB, ~hours, near-zero gain.

	# After verifying the archive is good (and ideally after uploading to HF),
	# the unzipped tree is redundant — drop it to reclaim 52 GB:
	rm -rf bench_ismb288_multiframe_repro

	# Re-creating later is cheap (rsync -a, ~52 GB read from beegfs sources):
	bash /scratch/shared/beegfs/yushi/Repo/geo4d_360/VideoX-Fun/bash_scripts/track4d_360/plucker/benchmark/prepare_repro_data_ismb288_multiframe.sh
	```

	---

	## How to eval

	All 4 checkpoints evaluate through the SAME master benchmark script
	(under `VideoX-Fun/` repo root). It takes care of per-variant arg dispatch
	so you don't have to think about the flags listed below.

	### Master benchmark (all 4 models, 5 datasets × 5 scenes × 10 trajectories @ 288x512x49)

	```bash
	cd /scratch/shared/beegfs/yushi/Repo/geo4d_360/VideoX-Fun
	bash bash_scripts/track4d_360/plucker/benchmark_clipfv_4models_288x512_2gpu.sh
	```

	Options:

	```bash
	# one variant only:
	MODEL=warped bash bash_scripts/track4d_360/plucker/benchmark_clipfv_4models_288x512_2gpu.sh
	MODEL=static13k bash ...
	MODEL=dynamic5k bash ...
	MODEL=ismb288_3k bash ...

	# custom GPU pair (defaults GPU0=0 GPU1=1):
	GPU0=4 GPU1=5 bash ...

	# point at backup ckpts instead of the live train dirs (example override):
	CKPT_WARPED=/scratch/shared/beegfs/yushi/logs/track4d-360/backup/warped_step-13000.safetensors \
	bash ...
	```

	The script writes to
	`/scratch/shared/beegfs/yushi/logs/track4d-360/benchmark/clipfv_4models_288x512_2gpu/`,
	with `summary.md` aggregated by
	`python -m track4d_360.tools.aggregate_clip_benchmark`.

	Already-completed trajectories auto-skip on re-run (`pred_rgb.mp4` existence
	check in each novel-traj script).

	### Single-dataset / ad-hoc invocations

	If you just want to run one checkpoint against one dataset, the benchmark script
	dispatches to these three eval entrypoints (all under `examples/wan2.1_fun/`):

	\| dataset \| script \|
	\|---\|---\|
	\| mvs_synth, dl3dv, re10k \| `eval_track4d360_hybrid_dense_static_scene_novel_traj.py` \|
	\| kubric \| `eval_track4d360_hybrid_dense_kubric_novel_traj.py` \|
	\| syn4d \| `eval_track4d360_hybrid_dense_syn4d_novel_traj.py` \|

	All three use `track4d_360.shared_args` as of 2026-04-23 — so they accept the
	full warped + plucker + dense + track flag set. **Exact per-variant flags are
	the table below — do not forget them: `build_eval_pipeline` reads
	`warped_condition_mode` via `getattr(..., "off")`, so an omitted flag on a
	warped checkpoint silently runs the model in the wrong architecture.**

	### Per-variant eval-flag recipe

	Must match training — silent mismatches are the #1 source of wrong results.
	See `doc/track4d-360/2026-04-23-clipfv-4models-288x512-benchmark.md` §2 bug log.

	```
	warped_step-13000.safetensors:
	--track_injection_mode single
	--warped_condition_mode latent_fuse
	--warped_appearance_fusion concat_proj
	--warped_geom_only_dense
	--dense_in_channels 2

	static13k_step-13500.safetensors
	dynamic5k_step-5000.safetensors
	ismb288_3k_step-3000.safetensors:
	--track_injection_mode per_block
	--track_injection_block_mode concat_project
	--warped_condition_mode off
	--dense_in_channels 5
	```

	Shared across all 4:

	```
	--use_plucker_camera_control
	--enable_v2v_plucker_camera_control
	--use_query_frame_impulse_condition
	--use_dense_branch
	--dense_proj_dim 32
	--dense_num_residual_blocks 2
	--dense_alpha_track 1.0
	--track_config config/track4d_360/default_conv3d_patchify_srcdepth.yaml
	--num_inference_steps 50
	--cfg_scale 1.0
	--sigma_shift 5.0
	--seed 42
	```

	And the base-DiT init path is always `weights/wan21-1p3b/diffusion_pytorch_model.safetensors`
	via `--vxf_init_checkpoint` (CLAUDE.md load-order Invariant A/B — VXF init must
	run BEFORE LoRA wrap and is required on both scratch and resume paths).