---
license: mit
tags:
- medical
- cancer
- ct-scan
- risk-prediction
- healthcare
- pytorch
- vision
datasets:
- NLST
metrics:
- auc
- c-index
language:
- en
library_name: transformers
pipeline_tag: image-classification
---

# Sybil - Lung Cancer Risk Prediction

## Model Description

Sybil is a validated deep learning model that predicts future lung cancer risk from a single low-dose chest CT (LDCT) scan. Published in the Journal of Clinical Oncology, this model can assess cancer risk over a 1-6 year timeframe.

### Key Features
- **Single Scan Analysis**: Requires only one LDCT scan
- **Multi-Year Prediction**: Provides risk scores for years 1-6
- **Validated Performance**: Tested across multiple institutions globally
- **Ensemble Approach**: Uses 5 models for robust predictions

## Model Details

- **Developed by**: MIT CSAIL & Mass General Cancer Center (Original)
- **Adapted by**: Lab-Rasool (Hugging Face version)
- **Model type**: 3D Convolutional Neural Network
- **Architecture**: 3D ResNet-18 with multi-attention pooling
- **Input**: LDCT scans (200 slices × 256×256 pixels)
- **Output**: 6 risk scores (years 1-6)
- **License**: MIT

## Performance Metrics

| Dataset | 1-Year AUC | 6-Year AUC |
|---------|------------|------------|
| NLST Test | 0.94 | 0.86 |
| MGH | 0.86 | 0.75 |
| CGMH Taiwan | 0.94 | 0.80 |

## Usage

```python
from huggingface_sybil import SybilHFWrapper, SybilConfig

# Load model
config = SybilConfig()
model = SybilHFWrapper.from_pretrained("Lab-Rasool/sybil")

# Prepare DICOM files
dicom_paths = ["scan1.dcm", "scan2.dcm", ...]

# Get predictions
output = model(dicom_paths=dicom_paths)
risk_scores = output.risk_scores

# Display results
for year, score in enumerate(risk_scores, 1):
    print(f"Year {year}: {score:.1%} risk")
```

## Intended Use

### Primary Use Cases
- Risk stratification in lung cancer screening programs
- Research on lung cancer prediction models
- Clinical decision support (with appropriate oversight)

### Users
- Healthcare providers
- Medical researchers
- Screening program coordinators

### Out of Scope
- Diagnosis of existing cancer
- Use with non-LDCT imaging (X-rays, MRI)
- Sole basis for clinical decisions

## Training Data

Trained on the National Lung Screening Trial (NLST) dataset:
- ~50,000 participants
- Ages 55-74
- Current/former heavy smokers
- 3 annual LDCT scans

## Ethical Considerations

⚠️ **Medical AI Notice**: This model should supplement, not replace, clinical judgment. Always consider:
- Complete patient history
- Other risk factors
- Current screening guidelines
- Need for human oversight

## Limitations

- Optimized for screening-eligible population (55-80 years)
- Requires LDCT scans specifically
- Performance may vary across different CT scanners
- Not validated for non-screening populations

## Citation

**Original Paper:**
```bibtex
@article{mikhael2023sybil,
  title={Sybil: a validated deep learning model to predict future lung cancer risk from a single low-dose chest computed tomography},
  author={Mikhael, Peter G and Wohlwend, Jeremy and Yala, Adam and Karstens, Ludvig and Xiang, Justin and Takigami, Angelo K and Bourgouin, Patrick P and Chan, PuiYee and Mrah, Sofiane and Amayri, Wael and others},
  journal={Journal of Clinical Oncology},
  volume={41},
  number={12},
  pages={2191--2200},
  year={2023},
  publisher={American Society of Clinical Oncology}
}
```

## Acknowledgments

This Hugging Face implementation is based on the original work by Peter G. Mikhael, Jeremy Wohlwend, and the team at MIT CSAIL and Massachusetts General Hospital. Original model and code available at [GitHub](https://github.com/reginabarzilaygroup/Sybil).

## Model Card Contact

For questions about this Hugging Face implementation: Lab-Rasool
For questions about the original model: See the [original repository](https://github.com/reginabarzilaygroup/Sybil)