Safetensors
tapct
custom_code
kbwgrootlipman commited on
Commit
2ca26e3
·
verified ·
1 Parent(s): d75d2fc

Update README.md

Browse files

changed textual description of license to CC-BY-NC-4.0

Files changed (1) hide show
  1. README.md +44 -44
README.md CHANGED
@@ -1,44 +1,44 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
4
-
5
- # TAP-CT: 3D Task-Agnostic Pretraining of CT Foundation Models
6
-
7
- TAP-CT is a suite of foundation models for computed tomography (CT) imaging, pretrained in a task-agnostic manner through an adaptation of DINOv2 for volumetric data. These models learn robust 3D representations from CT scans without requiring task-specific annotations.
8
-
9
- This repository provides TAP-CT-B-3D, a Vision Transformer (ViT-Base) architecture pretrained on volumetric inputs with a spatial resolution of (12, 224, 224) and a patch size of (4, 8, 8). For inference on full-resolution CT volumes, a sliding window approach can be employed to extract features across the entire scan. Additional TAP-CT model variants, as well as the image processor, will be released in future updates.
10
-
11
- ## Preprocessing
12
-
13
- While a dedicated image processor will be released in future updates, optimal feature extraction requires the following preprocessing pipeline:
14
-
15
- 1. **Orientation**: Convert the volume to LPS (Left-Posterior-Superior) orientation. While the model is likely orientation-invariant, all evaluations were conducted using LPS orientation.
16
- 2. **Spatial Resizing**: Resize the volume to a spatial resolution of \(z, 224, 224\) or \(z, 512, 512\), where \(z\) represents the number of slices along the axial dimension.
17
- 3. **Axial Padding**: Apply zero-padding along the \(z\)-axis to ensure divisibility by 4, accommodating the model's patch size of (4, 8, 8).
18
- 4. **Intensity Clipping**: Clip voxel intensities to the range \([-1008.0, 822.0]\) HU (Hounsfield Units).
19
- 5. **Normalization**: Apply z-score normalization using \(mean = -86.8086\) and \(std = 322.6347\).
20
-
21
- ## Usage
22
-
23
- ```python
24
- import torch
25
- from transformers import AutoModel
26
-
27
- # Load the model
28
- model = AutoModel.from_pretrained('nki-radiology/tap-ct-b-3d', trust_remote_code=True)
29
-
30
- # Prepare input (batch_size, channels, depth, height, width)
31
- x = torch.randn((16, 1, 12, 224, 224))
32
-
33
- # Forward pass
34
- output = model.forward(x)
35
- ```
36
-
37
- The model returns a `BaseModelOutputWithPooling` object from the transformers library. The `output.pooler_output` contains the pooled `[CLS]` token representation, while `output.last_hidden_state` contains the spatial patch token embeddings. To extract features from all intermediate transformer layers, pass `output_hidden_states=True` to the forward method.
38
-
39
- ## Model Details
40
-
41
- - **Model Type**: 3D CT Vision Foundation Model
42
- - **Input Shape**: `(batch_size, 1, depth, height, width)`
43
- - **Example Input**: `(16, 1, 12, 224, 224)` - batch of 16 CT crops with 12 slices at 224×224 resolution
44
- - **License**: Apache 2.0
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ ---
4
+
5
+ # TAP-CT: 3D Task-Agnostic Pretraining of CT Foundation Models
6
+
7
+ TAP-CT is a suite of foundation models for computed tomography (CT) imaging, pretrained in a task-agnostic manner through an adaptation of DINOv2 for volumetric data. These models learn robust 3D representations from CT scans without requiring task-specific annotations.
8
+
9
+ This repository provides TAP-CT-B-3D, a Vision Transformer (ViT-Base) architecture pretrained on volumetric inputs with a spatial resolution of (12, 224, 224) and a patch size of (4, 8, 8). For inference on full-resolution CT volumes, a sliding window approach can be employed to extract features across the entire scan. Additional TAP-CT model variants, as well as the image processor, will be released in future updates.
10
+
11
+ ## Preprocessing
12
+
13
+ While a dedicated image processor will be released in future updates, optimal feature extraction requires the following preprocessing pipeline:
14
+
15
+ 1. **Orientation**: Convert the volume to LPS (Left-Posterior-Superior) orientation. While the model is likely orientation-invariant, all evaluations were conducted using LPS orientation.
16
+ 2. **Spatial Resizing**: Resize the volume to a spatial resolution of \(z, 224, 224\) or \(z, 512, 512\), where \(z\) represents the number of slices along the axial dimension.
17
+ 3. **Axial Padding**: Apply zero-padding along the \(z\)-axis to ensure divisibility by 4, accommodating the model's patch size of (4, 8, 8).
18
+ 4. **Intensity Clipping**: Clip voxel intensities to the range \([-1008.0, 822.0]\) HU (Hounsfield Units).
19
+ 5. **Normalization**: Apply z-score normalization using \(mean = -86.8086\) and \(std = 322.6347\).
20
+
21
+ ## Usage
22
+
23
+ ```python
24
+ import torch
25
+ from transformers import AutoModel
26
+
27
+ # Load the model
28
+ model = AutoModel.from_pretrained('nki-radiology/tap-ct-b-3d', trust_remote_code=True)
29
+
30
+ # Prepare input (batch_size, channels, depth, height, width)
31
+ x = torch.randn((16, 1, 12, 224, 224))
32
+
33
+ # Forward pass
34
+ output = model.forward(x)
35
+ ```
36
+
37
+ The model returns a `BaseModelOutputWithPooling` object from the transformers library. The `output.pooler_output` contains the pooled `[CLS]` token representation, while `output.last_hidden_state` contains the spatial patch token embeddings. To extract features from all intermediate transformer layers, pass `output_hidden_states=True` to the forward method.
38
+
39
+ ## Model Details
40
+
41
+ - **Model Type**: 3D CT Vision Foundation Model
42
+ - **Input Shape**: `(batch_size, 1, depth, height, width)`
43
+ - **Example Input**: `(16, 1, 12, 224, 224)` - batch of 16 CT crops with 12 slices at 224×224 resolution
44
+ - **License**: CC-BY-NC-4.0