eriksmb commited on
Commit
8511da2
·
verified ·
1 Parent(s): 4c6ec07

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -13
README.md CHANGED
@@ -4,7 +4,7 @@ base_model:
4
  - Qwen/Qwen3-VL-30B-A3B-Thinking
5
  ---
6
 
7
- This encoder is a pure vision backbone designed for medical imaging foundation models, with a focus on radiology modalities such as CT, MRI, and X‑ray. It implements efficient 3D patch embedding, rotary position encodings, scalable Transformer blocks, multi‑scale deep feature extraction, and two self‑supervised objectives tailored for medical imagery: masked image modeling (MIM) and joint embedding predictive architecture (JEPA).
8
 
9
 
10
  ## Architecture Overview
@@ -51,9 +51,9 @@ Key components and how they map to the code:
51
 
52
  ## Radiology‑centric Design Notes
53
 
54
- - **Modalities**: CT/MRI volumes (slice stacks) and X‑ray images are supported via patch tokenization
55
  - **Through‑plane handling**: `temporal_patch_size` acts as slice depth for 3D patching over the Z/through‑plane axis
56
- - **Grayscale emphasis**: Use `in_channels=1` for CT/MRI/X‑ray to align MIM reconstruction shapes
57
  - **Scalability**: Attention backends support SDPA and FlashAttention‑2 for large studies and high‑res inputs
58
  - **Multi‑scale features**: `deepstack_visual_indexes` provide hooks for detection/segmentation heads
59
 
@@ -131,26 +131,23 @@ print(encoded_patches.shape)
131
 
132
  ## Recommended Radiology Settings
133
  - **CT chest/abdomen**: `patch_size=16`, `temporal_patch_size=16`, `in_channels=1`
134
- - **MRI brain**: `patch_size=16`, `temporal_patch_size=16` (or per‑sequence 2D with `temporal_patch_size=1`)
135
- - **X‑ray**: `patch_size=16`, `temporal_patch_size=1`, `in_channels=1`
136
 
137
 
138
  ## Notes
139
  - FlashAttention‑2 can be enabled via the attention implementation setting in the vision config
140
  - Ensure volume dimensions are divisible by `patch_size` and `temporal_patch_size` (or center‑crop/pad before patchify)
141
- - For multi‑sequence MRI or 4‑channel inputs, set `in_channels=4` and adapt reconstruction paths accordingly
142
 
143
 
144
  ## Training Data
145
- CTs for lung and abdomen from TCIA.
146
- TODO: Details about CTs.
147
- TODO: Details about CXRs.
148
- TODO: Details about MRIs.
149
 
150
 
151
- ## Citation
152
 
153
- If you use SMB‑RAD-Encoder-v1 in your research, please cite this repository.
154
  ```
155
  @software{standardmodel2025smbrad,
156
  author = {Chen, Zekai and Adam, Irsyad and Laprade, David and Brown, Kevin and others},
@@ -160,4 +157,4 @@ If you use SMB‑RAD-Encoder-v1 in your research, please cite this repository.
160
  journal = {Standard Model Blog},
161
  url = {https://huggingface.co/standardmodelbio/SMB-RAD-Encoder-v1/edit/main/README.md}
162
  }
163
- ```
 
4
  - Qwen/Qwen3-VL-30B-A3B-Thinking
5
  ---
6
 
7
+ This encoder is a pure vision backbone designed for medical imaging foundation models. It implements efficient 3D patch embedding, rotary position encodings, scalable Transformer blocks, multi‑scale deep feature extraction, and two self‑supervised objectives tailored for medical imagery: masked image modeling (MIM) and joint embedding predictive architecture (JEPA).
8
 
9
 
10
  ## Architecture Overview
 
51
 
52
  ## Radiology‑centric Design Notes
53
 
54
+ - **Modalities**: CT volumes (slice stacks) are supported via patch tokenization
55
  - **Through‑plane handling**: `temporal_patch_size` acts as slice depth for 3D patching over the Z/through‑plane axis
56
+ - **Grayscale emphasis**: Use `in_channels=1` for CT to align MIM reconstruction shapes
57
  - **Scalability**: Attention backends support SDPA and FlashAttention‑2 for large studies and high‑res inputs
58
  - **Multi‑scale features**: `deepstack_visual_indexes` provide hooks for detection/segmentation heads
59
 
 
131
 
132
  ## Recommended Radiology Settings
133
  - **CT chest/abdomen**: `patch_size=16`, `temporal_patch_size=16`, `in_channels=1`
134
+ <!-- - **MRI brain**: `patch_size=16`, `temporal_patch_size=16` (or per‑sequence 2D with `temporal_patch_size=1`)
135
+ - **X‑ray**: `patch_size=16`, `temporal_patch_size=1`, `in_channels=1` -->
136
 
137
 
138
  ## Notes
139
  - FlashAttention‑2 can be enabled via the attention implementation setting in the vision config
140
  - Ensure volume dimensions are divisible by `patch_size` and `temporal_patch_size` (or center‑crop/pad before patchify)
141
+ <!-- - For multi‑sequence MRI or 4‑channel inputs, set `in_channels=4` and adapt reconstruction paths accordingly -->
142
 
143
 
144
  ## Training Data
145
+ Proprietary dataset of 1M CT studies.
 
 
 
146
 
147
 
148
+ <!-- ## Citation
149
 
150
+ If you use this model in your research, please cite this repository.
151
  ```
152
  @software{standardmodel2025smbrad,
153
  author = {Chen, Zekai and Adam, Irsyad and Laprade, David and Brown, Kevin and others},
 
157
  journal = {Standard Model Blog},
158
  url = {https://huggingface.co/standardmodelbio/SMB-RAD-Encoder-v1/edit/main/README.md}
159
  }
160
+ ``` -->