standardmodelbio
/

smb-vision-v1

Safetensors

smb_vision_model

custom_code

Model card Files Files and versions

xet

Community

eriksmb commited on Feb 11

Commit

8511da2

verified ·

1 Parent(s): 4c6ec07

Update README.md

Browse files

Files changed (1) hide show

README.md +10 -13

README.md CHANGED Viewed

@@ -4,7 +4,7 @@ base_model:
 - Qwen/Qwen3-VL-30B-A3B-Thinking
 ---
-This encoder is a pure vision backbone designed for medical imaging foundation models, with a focus on radiology modalities such as CT, MRI, and X‑ray. It implements efficient 3D patch embedding, rotary position encodings, scalable Transformer blocks, multi‑scale deep feature extraction, and two self‑supervised objectives tailored for medical imagery: masked image modeling (MIM) and joint embedding predictive architecture (JEPA).
 ## Architecture Overview
@@ -51,9 +51,9 @@ Key components and how they map to the code:
 ## Radiology‑centric Design Notes
-- **Modalities**: CT/MRI volumes (slice stacks) and X‑ray images are supported via patch tokenization
 - **Through‑plane handling**: `temporal_patch_size` acts as slice depth for 3D patching over the Z/through‑plane axis
-- **Grayscale emphasis**: Use `in_channels=1` for CT/MRI/X‑ray to align MIM reconstruction shapes
 - **Scalability**: Attention backends support SDPA and FlashAttention‑2 for large studies and high‑res inputs
 - **Multi‑scale features**: `deepstack_visual_indexes` provide hooks for detection/segmentation heads
@@ -131,26 +131,23 @@ print(encoded_patches.shape)
 ## Recommended Radiology Settings
 - **CT chest/abdomen**: `patch_size=16`, `temporal_patch_size=16`, `in_channels=1`
-- **MRI brain**: `patch_size=16`, `temporal_patch_size=16` (or per‑sequence 2D with `temporal_patch_size=1`)
-- **X‑ray**: `patch_size=16`, `temporal_patch_size=1`, `in_channels=1`
 ## Notes
 - FlashAttention‑2 can be enabled via the attention implementation setting in the vision config
 - Ensure volume dimensions are divisible by `patch_size` and `temporal_patch_size` (or center‑crop/pad before patchify)
-- For multi‑sequence MRI or 4‑channel inputs, set `in_channels=4` and adapt reconstruction paths accordingly
 ## Training Data
-CTs for lung and abdomen from TCIA.
-TODO: Details about CTs.
-TODO: Details about CXRs.
-TODO: Details about MRIs.
-## Citation
-If you use SMB‑RAD-Encoder-v1 in your research, please cite this repository.
 ```
 @software{standardmodel2025smbrad,
   author = {Chen, Zekai and Adam, Irsyad and Laprade, David and Brown, Kevin and others},
@@ -160,4 +157,4 @@ If you use SMB‑RAD-Encoder-v1 in your research, please cite this repository.
   journal = {Standard Model Blog},
   url = {https://huggingface.co/standardmodelbio/SMB-RAD-Encoder-v1/edit/main/README.md}
 }
-```

 - Qwen/Qwen3-VL-30B-A3B-Thinking
 ---
+This encoder is a pure vision backbone designed for medical imaging foundation models. It implements efficient 3D patch embedding, rotary position encodings, scalable Transformer blocks, multi‑scale deep feature extraction, and two self‑supervised objectives tailored for medical imagery: masked image modeling (MIM) and joint embedding predictive architecture (JEPA).
 ## Architecture Overview
 ## Radiology‑centric Design Notes
+- **Modalities**: CT volumes (slice stacks) are supported via patch tokenization
 - **Through‑plane handling**: `temporal_patch_size` acts as slice depth for 3D patching over the Z/through‑plane axis
+- **Grayscale emphasis**: Use `in_channels=1` for CT to align MIM reconstruction shapes
 - **Scalability**: Attention backends support SDPA and FlashAttention‑2 for large studies and high‑res inputs
 - **Multi‑scale features**: `deepstack_visual_indexes` provide hooks for detection/segmentation heads
 ## Recommended Radiology Settings
 - **CT chest/abdomen**: `patch_size=16`, `temporal_patch_size=16`, `in_channels=1`
+<!-- - **MRI brain**: `patch_size=16`, `temporal_patch_size=16` (or per‑sequence 2D with `temporal_patch_size=1`)
+- **X‑ray**: `patch_size=16`, `temporal_patch_size=1`, `in_channels=1` -->
 ## Notes
 - FlashAttention‑2 can be enabled via the attention implementation setting in the vision config
 - Ensure volume dimensions are divisible by `patch_size` and `temporal_patch_size` (or center‑crop/pad before patchify)
+<!-- - For multi‑sequence MRI or 4‑channel inputs, set `in_channels=4` and adapt reconstruction paths accordingly -->
 ## Training Data
+Proprietary dataset of 1M CT studies.
+<!-- ## Citation
+If you use this model in your research, please cite this repository.
 ```
 @software{standardmodel2025smbrad,
   author = {Chen, Zekai and Adam, Irsyad and Laprade, David and Brown, Kevin and others},
   journal = {Standard Model Blog},
   url = {https://huggingface.co/standardmodelbio/SMB-RAD-Encoder-v1/edit/main/README.md}
 }
+``` -->