Update README.md
Browse files
README.md
CHANGED
|
@@ -4,7 +4,7 @@ base_model:
|
|
| 4 |
- Qwen/Qwen3-VL-30B-A3B-Thinking
|
| 5 |
---
|
| 6 |
|
| 7 |
-
This encoder is a pure vision backbone designed for medical imaging foundation models
|
| 8 |
|
| 9 |
|
| 10 |
## Architecture Overview
|
|
@@ -51,9 +51,9 @@ Key components and how they map to the code:
|
|
| 51 |
|
| 52 |
## Radiology‑centric Design Notes
|
| 53 |
|
| 54 |
-
- **Modalities**: CT
|
| 55 |
- **Through‑plane handling**: `temporal_patch_size` acts as slice depth for 3D patching over the Z/through‑plane axis
|
| 56 |
-
- **Grayscale emphasis**: Use `in_channels=1` for CT
|
| 57 |
- **Scalability**: Attention backends support SDPA and FlashAttention‑2 for large studies and high‑res inputs
|
| 58 |
- **Multi‑scale features**: `deepstack_visual_indexes` provide hooks for detection/segmentation heads
|
| 59 |
|
|
@@ -131,26 +131,23 @@ print(encoded_patches.shape)
|
|
| 131 |
|
| 132 |
## Recommended Radiology Settings
|
| 133 |
- **CT chest/abdomen**: `patch_size=16`, `temporal_patch_size=16`, `in_channels=1`
|
| 134 |
-
- **MRI brain**: `patch_size=16`, `temporal_patch_size=16` (or per‑sequence 2D with `temporal_patch_size=1`)
|
| 135 |
-
- **X‑ray**: `patch_size=16`, `temporal_patch_size=1`, `in_channels=1`
|
| 136 |
|
| 137 |
|
| 138 |
## Notes
|
| 139 |
- FlashAttention‑2 can be enabled via the attention implementation setting in the vision config
|
| 140 |
- Ensure volume dimensions are divisible by `patch_size` and `temporal_patch_size` (or center‑crop/pad before patchify)
|
| 141 |
-
- For multi‑sequence MRI or 4‑channel inputs, set `in_channels=4` and adapt reconstruction paths accordingly
|
| 142 |
|
| 143 |
|
| 144 |
## Training Data
|
| 145 |
-
|
| 146 |
-
TODO: Details about CTs.
|
| 147 |
-
TODO: Details about CXRs.
|
| 148 |
-
TODO: Details about MRIs.
|
| 149 |
|
| 150 |
|
| 151 |
-
## Citation
|
| 152 |
|
| 153 |
-
If you use
|
| 154 |
```
|
| 155 |
@software{standardmodel2025smbrad,
|
| 156 |
author = {Chen, Zekai and Adam, Irsyad and Laprade, David and Brown, Kevin and others},
|
|
@@ -160,4 +157,4 @@ If you use SMB‑RAD-Encoder-v1 in your research, please cite this repository.
|
|
| 160 |
journal = {Standard Model Blog},
|
| 161 |
url = {https://huggingface.co/standardmodelbio/SMB-RAD-Encoder-v1/edit/main/README.md}
|
| 162 |
}
|
| 163 |
-
```
|
|
|
|
| 4 |
- Qwen/Qwen3-VL-30B-A3B-Thinking
|
| 5 |
---
|
| 6 |
|
| 7 |
+
This encoder is a pure vision backbone designed for medical imaging foundation models. It implements efficient 3D patch embedding, rotary position encodings, scalable Transformer blocks, multi‑scale deep feature extraction, and two self‑supervised objectives tailored for medical imagery: masked image modeling (MIM) and joint embedding predictive architecture (JEPA).
|
| 8 |
|
| 9 |
|
| 10 |
## Architecture Overview
|
|
|
|
| 51 |
|
| 52 |
## Radiology‑centric Design Notes
|
| 53 |
|
| 54 |
+
- **Modalities**: CT volumes (slice stacks) are supported via patch tokenization
|
| 55 |
- **Through‑plane handling**: `temporal_patch_size` acts as slice depth for 3D patching over the Z/through‑plane axis
|
| 56 |
+
- **Grayscale emphasis**: Use `in_channels=1` for CT to align MIM reconstruction shapes
|
| 57 |
- **Scalability**: Attention backends support SDPA and FlashAttention‑2 for large studies and high‑res inputs
|
| 58 |
- **Multi‑scale features**: `deepstack_visual_indexes` provide hooks for detection/segmentation heads
|
| 59 |
|
|
|
|
| 131 |
|
| 132 |
## Recommended Radiology Settings
|
| 133 |
- **CT chest/abdomen**: `patch_size=16`, `temporal_patch_size=16`, `in_channels=1`
|
| 134 |
+
<!-- - **MRI brain**: `patch_size=16`, `temporal_patch_size=16` (or per‑sequence 2D with `temporal_patch_size=1`)
|
| 135 |
+
- **X‑ray**: `patch_size=16`, `temporal_patch_size=1`, `in_channels=1` -->
|
| 136 |
|
| 137 |
|
| 138 |
## Notes
|
| 139 |
- FlashAttention‑2 can be enabled via the attention implementation setting in the vision config
|
| 140 |
- Ensure volume dimensions are divisible by `patch_size` and `temporal_patch_size` (or center‑crop/pad before patchify)
|
| 141 |
+
<!-- - For multi‑sequence MRI or 4‑channel inputs, set `in_channels=4` and adapt reconstruction paths accordingly -->
|
| 142 |
|
| 143 |
|
| 144 |
## Training Data
|
| 145 |
+
Proprietary dataset of 1M CT studies.
|
|
|
|
|
|
|
|
|
|
| 146 |
|
| 147 |
|
| 148 |
+
<!-- ## Citation
|
| 149 |
|
| 150 |
+
If you use this model in your research, please cite this repository.
|
| 151 |
```
|
| 152 |
@software{standardmodel2025smbrad,
|
| 153 |
author = {Chen, Zekai and Adam, Irsyad and Laprade, David and Brown, Kevin and others},
|
|
|
|
| 157 |
journal = {Standard Model Blog},
|
| 158 |
url = {https://huggingface.co/standardmodelbio/SMB-RAD-Encoder-v1/edit/main/README.md}
|
| 159 |
}
|
| 160 |
+
``` -->
|