upload

Browse files

Files changed (4) hide show

.gitattributes +35 -0
README.md +76 -0
ckpt_best.pt +3 -0
config.snapshot.yaml +116 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,35 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,76 @@

+---
+license: cc-by-nc-nd-4.0
+library_name: pytorch
+tags:
+  - medical-imaging
+  - 3d-cnn
+  - ultrasound
+  - focused-ultrasound
+  - transcranial-ultrasound
+  - reproduction
+datasets:
+  - vinkle-srivastav/TFUScapes
+language:
+  - en
+---
+# DeepTFUS: base (run-1 reproduction)
+*A reproduction attempt of DeepTFUS, proposed by [Srivastav et al. (arXiv:2505.12998)](https://arxiv.org/abs/2505.12998).*
+This is the from-scratch baseline: 50 epochs on the paper recipe
+(weighted-MSE + λ·gradient-L1, no focal-position aux), `base_width=16`
+(3.4 M params), `pure-bf16`, `batch=4` at 256³ resolution. Given a 3D
+head CT and a transducer placement, predicts the resulting in-skull
+pressure field in <1 s on an H100 (≈ 50× faster than the k-Wave
+physics simulator the dataset was generated from).
+⭐ Partial reproduction: matched paper on `relative_l2`, did not match
+on `focal_position_error_mm` (~2× worse) or `max_pressure_error`. This
+gap motivated the 5 fine-tune variants in this model collection.
+## Test results (n = 597 held-out CT × placement combinations)
+| metric | paper | base (this model) | reproduced? |
+|---|---:|---:|---|
+| `relative_l2` mean ± std | 0.414 ± 0.086 | **0.384 ± 0.078** | ✅ Yes (slightly beats paper) |
+| `relative_l2` median | 0.394 | **0.369** | ✅ |
+| `focal_position_error_mm` mean ± std | 2.89 ± 2.14 | 6.49 ± 4.58 | ❌ No (~2.25× worse mean) |
+| `focal_position_error_mm` median | 2.45 | 5.15 | ❌ |
+| `max_pressure_error` mean ± std | 0.199 ± 0.158 | 0.225 ± 0.116 | ✅ Yes (within paper's std) |
+| `max_pressure_error` median | 0.166 | 0.217 | (slightly above paper) |
+| `focal_pressure_error` median | : | 0.528 | : |
+| `focal_iou_fwhm` median | : | 0.143 | : |
+| `inference_latency_s` (b=1, H100) | 11.4 (RTX 4090) | 0.233 | 49× faster (different HW) |
+## Other variants and discussion
+See the [Collection](https://huggingface.co/collections/masonwang025/deeptfus-reproduction-6a03e39286a09470b960511f)
+for the 5 fine-tune variants built from this base ckpt, and the
+[project page](https://masonjwang.com/projects/reproducing-deeptfus)
+for the full reproduction story, interactive viewer, and discussion of
+trade-offs.
+## Usage
+```python
+from huggingface_hub import hf_hub_download
+import torch
+ckpt = torch.load(
+    hf_hub_download("masonwang025/deeptfus-base", "ckpt_best.pt"),
+    map_location="cpu", weights_only=False,
+)
+# ckpt['model']  : state_dict for the model defined in masonwang025/deeptfus repo
+# ckpt['config'] : training config (architecture knobs + train hyperparams)
+# ckpt['epoch']  : 43 (best by val_rel_l2)
+```
+Model code: [github.com/masonwang025/deeptfus](https://github.com/masonwang025/deeptfus).
+## Citation & License
+Paper: Srivastav et al., [arXiv:2505.12998](https://arxiv.org/abs/2505.12998), 2025.
+License: [CC-BY-NC-ND-4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/),
+matching the TFUScapes dataset license.

ckpt_best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4b362a03019d2f614913e9e5b375d5a501b41217ad43bfe68f20cdba40803059
+size 20743755

config.snapshot.yaml ADDED Viewed

	@@ -0,0 +1,116 @@

+# DeepTFUS reproduction config (GPU defaults).
+#
+# Paper: Srivastav et al., "A Skull-Adaptive Framework for AI-Based 3D
+# Transcranial Focused Ultrasound Simulation", arXiv:2505.12998. PDF at
+# repo root: original-paper.pdf.
+#
+# For local Mac smoke testing, use `python scripts/local_verify.py`; that
+# script overrides the heavy fields in-process. Do not edit this file for
+# verification.
+#
+# Architectural specifics the paper does not pin down (base_width, depth,
+# dynamic-conv kernel size, cross-attention head count, which encoder
+# levels carry cross-attention) are flagged TENTATIVE below. An email
+# is out to the authors; update those values when they reply.
+data:
+  resolution: 256                  # paper: 256^3 cropped subvolumes
+  n_transducer_points: 512         # uniform random subsample per step
+model:
+  base_width: 16                   # TENTATIVE; paper does not specify capacity. We picked 16
+                                   # because the paper's own ablation (Table 1) shows
+                                   # DeepTFUStiny within 1σ of DeepTFUS on every metric, and
+                                   # neither variant has a published param count, so there is
+                                   # no paper-anchored "right size". 16 → 2.6M params, 7h
+                                   # at 50 epochs, fits without grad-checkpointing.
+  cond_dim: 128                    # transducer embedding dim (z_T)
+  n_transducer_freqs: 8            # TENTATIVE; paper says "Fourier PE" no count
+  dynamic_conv_kernel: 3           # TENTATIVE; paper does not specify
+  cross_attention_heads: 4         # TENTATIVE
+  cross_attention_levels: [level1, level2, level3, bottleneck]
+                                   # Paper says "each encoder level"; level0 at
+                                   # 256^3 is OOM on a single 80GB H100 even with
+                                   # direction-1 disabled (the concat+1x1x1 fusion
+                                   # at that resolution adds ~15 GB on top of the
+                                   # ~55 GB base). level1..bottleneck fit and
+                                   # match the paper at each of those levels.
+  cross_attention_bidirectional: false
+                                   # Paper says "bi-directional ... two multi-head
+                                   # attention blocks". We default to false
+                                   # because direction 1 (CT-queries-z_T) is
+                                   # degenerate with z_T as a single token:
+                                   # softmax over one key is identically 1, so it
+                                   # collapses to a learned broadcast of a
+                                   # projection of z_T — same function the
+                                   # encoder's DynamicConv layers already serve.
+                                   # Flip to true to recover the paper-faithful
+                                   # bi-directional design.
+  use_film_decoder: false          # Paper §3.2 puts FiLM in the decoding path.
+                                   # We default to false because the paper's own
+                                   # Table 1 "No FiLM" row shows lower
+                                   # max_pressure_error than full DeepTFUS and is
+                                   # within 1 sigma on every other metric. With
+                                   # FiLM off, decoder is a plain U-Net decoder.
+                                   # Flip to true to recover paper-faithful FiLM.
+loss:
+  alpha: 5.0                       # paper Eq 5: exponent in w(v)=exp(α(P-maxP))/E[...]
+  grad_weight: 0.1                 # paper: lambda for the gradient-L1 term
+train:
+  epochs: 50                       # paper
+  batch_size: 4                    # paper
+  lr: 0.001                        # paper
+  weight_decay: 0.0001
+  grad_clip: 1.0
+  seed: 0
+  num_workers: 4                   # for CUDA; local_verify forces 0 on MPS
+  val_every: 1                     # epochs
+  # Precision / memory. The H100 path needs both of these to fit batch=4 at
+  # 256^3 (fp32 OOMs at >78 GiB on the first encoder level; autocast bf16 also
+  # OOMs because GroupNorm/FiLM upcast paths leak fp32 into downstream
+  # activations). See docs/1-reproduction-setup/synthetic_bench.md and
+  # investigation.md for the receipts. local_verify.py overrides these to
+  # fp32/false for the CPU/MPS smoke test.
+  precision: pure-bf16             # fp32 | pure-bf16. autocast bf16 is a trap on this
+                                   # model (GroupNorm/FiLM/DynConv leak fp32 promotions).
+  grad_checkpoint_encoder: false   # at base_width=16 we fit without checkpointing.
+                                   # Flip true if you bump base_width back to 24+.
+  # Speedups (benched 2026-05-11; see docs/2-paper-audit/bench_speedups.txt).
+  # Both DEFAULT OFF after empirical findings on this specific model:
+  channels_last: false             # channels_last_3d was 17% SLOWER on this model
+                                   # (1574 ms vs 1344 ms baseline) because cuDNN's
+                                   # 3D NHWC kernels do not have a good path for
+                                   # our depthwise grouped DynamicConv3d. Flip
+                                   # true only if you remove DynamicConv3d.
+  compile: false                   # torch.compile OOMs the up1 decoder stage
+                                   # because Inductor materializes the (B, 48, 256^3)
+                                   # = 6 GiB concat intermediate that eager mode
+                                   # streams through. Could be re-attempted with
+                                   # mode="reduce-overhead" or by compiling only
+                                   # sub-modules; left as future work.
+  # Run observability. WandB init no-ops if WANDB_API_KEY is unset OR
+  # wandb_project is null.
+  wandb_project: deeptfus-reproduction
+  wandb_entity: mason-wang         # team entity; personal entities disabled on this account
+  # Per-epoch checkpoint frequency. 0 disables (only ckpt_best and ckpt_last
+  # are written). Set to e.g. 5 to keep ckpt_epoch_004.pt, ckpt_epoch_009.pt,
+  # ... in addition. Each ckpt is ~20 MB at bw=16; 50 epochs = ~1 GB total.
+  save_every_epochs: 0
+eval:
+  voxel_size_mm: 0.5               # paper Section 3.3 (k-Wave grid)
+  focal_threshold_db: -6.0         # iso-surface threshold for focal-volume Dice
+  off_target_min_dist_mm: 10.0     # secondary-lobe radius exclusion
+  n_warmup_inferences: 3           # for latency measurement
+  save_predictions: true           # write per-sample pred npz for figures
+output:
+  run_dir: runs/deeptfus