---
license: mit
language:
- en
tags:
- medical-imaging
- self-supervised-learning
- mae
- swin-transformer
- 3d-vision
- pytorch
- ct
- opscc
---

# 3D Swin Transformer MAE for OPSCC CT Pretraining

Self-supervised masked autoencoder (MAE) using a **3D Swin Transformer** backbone trained on cropped OPSCC neck CT volumes.  
Includes asymmetry-aware loss weighting (airway + soft-tissue features) and overfitting monitoring via augmented-pair cosine similarity.

## Model Details

- **Architecture**: 3D Swin Transformer encoder + lightweight asymmetric decoder + auxiliary asymmetry prediction heads  
- **Input shape**: 1×60×128×128 (single-channel CT volumes, intensities normalized to [0,1])  
- **Pretraining objective**: Masked reconstruction (75% masking ratio) + auxiliary asymmetry regression  
- **Drop path rate**: linear schedule up to 0.1  
- **Training**: AdamW, lr=1e-4, batch size 2 (adjustable), early stopping + cosine sim monitoring  

## Intended Use & Limitations

**Primary use**: Pretraining foundation for downstream OPSCC tasks (staging, segmentation, outcome prediction)  
**Not intended for**: Direct clinical diagnosis without fine-tuning and validation  

**Limitations**:
- Trained on limited cohort (TCIA-derived OPSCC cases)
- Assumes cropped, skull-base-to-thoracic-inlet volumes
- Asymmetry heuristics are rule-based → may miss subtle cases
- No multi-modal / contrast-enhanced support yet

## How to Use

```bash
# 1. Clone repo
git clone https://huggingface.co/jdmayfield/opscc-ct-mae-swin-pretrain
cd opscc-ct-mae-swin-pretrain

# 2. Install deps
pip install -r requirements.txt

# 3. Train (or resume from checkpoint)
python train_mae_swin3d.py \
  --data-dir /path/to/your/cropped_volumes \
  --output-dir ./checkpoints \
  --epochs 100 \
  --batch-size 2 \
  --lr 1e-4