File size: 1,814 Bytes
40f37af df2c6bc 40f37af df2c6bc 40f37af df2c6bc bad7c96 df2c6bc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | ---
license: mit
language:
- en
tags:
- medical-imaging
- self-supervised-learning
- mae
- swin-transformer
- 3d-vision
- pytorch
- ct
- opscc
---
# 3D Swin Transformer MAE for OPSCC CT Pretraining
Self-supervised masked autoencoder (MAE) using a **3D Swin Transformer** backbone trained on cropped OPSCC neck CT volumes.
Includes asymmetry-aware loss weighting (airway + soft-tissue features) and overfitting monitoring via augmented-pair cosine similarity.
## Model Details
- **Architecture**: 3D Swin Transformer encoder + lightweight asymmetric decoder + auxiliary asymmetry prediction heads
- **Input shape**: 1×60×128×128 (single-channel CT volumes, intensities normalized to [0,1])
- **Pretraining objective**: Masked reconstruction (75% masking ratio) + auxiliary asymmetry regression
- **Drop path rate**: linear schedule up to 0.1
- **Training**: AdamW, lr=1e-4, batch size 2 (adjustable), early stopping + cosine sim monitoring
## Intended Use & Limitations
**Primary use**: Pretraining foundation for downstream OPSCC tasks (staging, segmentation, outcome prediction)
**Not intended for**: Direct clinical diagnosis without fine-tuning and validation
**Limitations**:
- Trained on limited cohort (TCIA-derived OPSCC cases)
- Assumes cropped, skull-base-to-thoracic-inlet volumes
- Asymmetry heuristics are rule-based → may miss subtle cases
- No multi-modal / contrast-enhanced support yet
## How to Use
```bash
# 1. Clone repo
git clone https://huggingface.co/jdmayfield/opscc-ct-mae-swin-pretrain
cd opscc-ct-mae-swin-pretrain
# 2. Install deps
pip install -r requirements.txt
# 3. Train (or resume from checkpoint)
python train_mae_swin3d.py \
--data-dir /path/to/your/cropped_volumes \
--output-dir ./checkpoints \
--epochs 100 \
--batch-size 2 \
--lr 1e-4 |