Update README.md
Browse fileschanged textual description of license to CC-BY-NC-4.0
README.md
CHANGED
|
@@ -1,44 +1,44 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: cc-by-nc-4.0
|
| 3 |
-
---
|
| 4 |
-
|
| 5 |
-
# TAP-CT: 3D Task-Agnostic Pretraining of CT Foundation Models
|
| 6 |
-
|
| 7 |
-
TAP-CT is a suite of foundation models for computed tomography (CT) imaging, pretrained in a task-agnostic manner through an adaptation of DINOv2 for volumetric data. These models learn robust 3D representations from CT scans without requiring task-specific annotations.
|
| 8 |
-
|
| 9 |
-
This repository provides TAP-CT-B-3D, a Vision Transformer (ViT-Base) architecture pretrained on volumetric inputs with a spatial resolution of (12, 224, 224) and a patch size of (4, 8, 8). For inference on full-resolution CT volumes, a sliding window approach can be employed to extract features across the entire scan. Additional TAP-CT model variants, as well as the image processor, will be released in future updates.
|
| 10 |
-
|
| 11 |
-
## Preprocessing
|
| 12 |
-
|
| 13 |
-
While a dedicated image processor will be released in future updates, optimal feature extraction requires the following preprocessing pipeline:
|
| 14 |
-
|
| 15 |
-
1. **Orientation**: Convert the volume to LPS (Left-Posterior-Superior) orientation. While the model is likely orientation-invariant, all evaluations were conducted using LPS orientation.
|
| 16 |
-
2. **Spatial Resizing**: Resize the volume to a spatial resolution of \(z, 224, 224\) or \(z, 512, 512\), where \(z\) represents the number of slices along the axial dimension.
|
| 17 |
-
3. **Axial Padding**: Apply zero-padding along the \(z\)-axis to ensure divisibility by 4, accommodating the model's patch size of (4, 8, 8).
|
| 18 |
-
4. **Intensity Clipping**: Clip voxel intensities to the range \([-1008.0, 822.0]\) HU (Hounsfield Units).
|
| 19 |
-
5. **Normalization**: Apply z-score normalization using \(mean = -86.8086\) and \(std = 322.6347\).
|
| 20 |
-
|
| 21 |
-
## Usage
|
| 22 |
-
|
| 23 |
-
```python
|
| 24 |
-
import torch
|
| 25 |
-
from transformers import AutoModel
|
| 26 |
-
|
| 27 |
-
# Load the model
|
| 28 |
-
model = AutoModel.from_pretrained('nki-radiology/tap-ct-b-3d', trust_remote_code=True)
|
| 29 |
-
|
| 30 |
-
# Prepare input (batch_size, channels, depth, height, width)
|
| 31 |
-
x = torch.randn((16, 1, 12, 224, 224))
|
| 32 |
-
|
| 33 |
-
# Forward pass
|
| 34 |
-
output = model.forward(x)
|
| 35 |
-
```
|
| 36 |
-
|
| 37 |
-
The model returns a `BaseModelOutputWithPooling` object from the transformers library. The `output.pooler_output` contains the pooled `[CLS]` token representation, while `output.last_hidden_state` contains the spatial patch token embeddings. To extract features from all intermediate transformer layers, pass `output_hidden_states=True` to the forward method.
|
| 38 |
-
|
| 39 |
-
## Model Details
|
| 40 |
-
|
| 41 |
-
- **Model Type**: 3D CT Vision Foundation Model
|
| 42 |
-
- **Input Shape**: `(batch_size, 1, depth, height, width)`
|
| 43 |
-
- **Example Input**: `(16, 1, 12, 224, 224)` - batch of 16 CT crops with 12 slices at 224×224 resolution
|
| 44 |
-
- **License**:
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-4.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# TAP-CT: 3D Task-Agnostic Pretraining of CT Foundation Models
|
| 6 |
+
|
| 7 |
+
TAP-CT is a suite of foundation models for computed tomography (CT) imaging, pretrained in a task-agnostic manner through an adaptation of DINOv2 for volumetric data. These models learn robust 3D representations from CT scans without requiring task-specific annotations.
|
| 8 |
+
|
| 9 |
+
This repository provides TAP-CT-B-3D, a Vision Transformer (ViT-Base) architecture pretrained on volumetric inputs with a spatial resolution of (12, 224, 224) and a patch size of (4, 8, 8). For inference on full-resolution CT volumes, a sliding window approach can be employed to extract features across the entire scan. Additional TAP-CT model variants, as well as the image processor, will be released in future updates.
|
| 10 |
+
|
| 11 |
+
## Preprocessing
|
| 12 |
+
|
| 13 |
+
While a dedicated image processor will be released in future updates, optimal feature extraction requires the following preprocessing pipeline:
|
| 14 |
+
|
| 15 |
+
1. **Orientation**: Convert the volume to LPS (Left-Posterior-Superior) orientation. While the model is likely orientation-invariant, all evaluations were conducted using LPS orientation.
|
| 16 |
+
2. **Spatial Resizing**: Resize the volume to a spatial resolution of \(z, 224, 224\) or \(z, 512, 512\), where \(z\) represents the number of slices along the axial dimension.
|
| 17 |
+
3. **Axial Padding**: Apply zero-padding along the \(z\)-axis to ensure divisibility by 4, accommodating the model's patch size of (4, 8, 8).
|
| 18 |
+
4. **Intensity Clipping**: Clip voxel intensities to the range \([-1008.0, 822.0]\) HU (Hounsfield Units).
|
| 19 |
+
5. **Normalization**: Apply z-score normalization using \(mean = -86.8086\) and \(std = 322.6347\).
|
| 20 |
+
|
| 21 |
+
## Usage
|
| 22 |
+
|
| 23 |
+
```python
|
| 24 |
+
import torch
|
| 25 |
+
from transformers import AutoModel
|
| 26 |
+
|
| 27 |
+
# Load the model
|
| 28 |
+
model = AutoModel.from_pretrained('nki-radiology/tap-ct-b-3d', trust_remote_code=True)
|
| 29 |
+
|
| 30 |
+
# Prepare input (batch_size, channels, depth, height, width)
|
| 31 |
+
x = torch.randn((16, 1, 12, 224, 224))
|
| 32 |
+
|
| 33 |
+
# Forward pass
|
| 34 |
+
output = model.forward(x)
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
The model returns a `BaseModelOutputWithPooling` object from the transformers library. The `output.pooler_output` contains the pooled `[CLS]` token representation, while `output.last_hidden_state` contains the spatial patch token embeddings. To extract features from all intermediate transformer layers, pass `output_hidden_states=True` to the forward method.
|
| 38 |
+
|
| 39 |
+
## Model Details
|
| 40 |
+
|
| 41 |
+
- **Model Type**: 3D CT Vision Foundation Model
|
| 42 |
+
- **Input Shape**: `(batch_size, 1, depth, height, width)`
|
| 43 |
+
- **Example Input**: `(16, 1, 12, 224, 224)` - batch of 16 CT crops with 12 slices at 224×224 resolution
|
| 44 |
+
- **License**: CC-BY-NC-4.0
|