|
|
--- |
|
|
language: en |
|
|
pipeline_tag: text-classification |
|
|
tags: |
|
|
- biogpt |
|
|
- medical |
|
|
- radiology |
|
|
- birads |
|
|
- classification |
|
|
license: mit |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
model-index: |
|
|
- name: BioGPT BI-RADS Classifier |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: BI-RADS Classification |
|
|
metrics: |
|
|
- type: accuracy |
|
|
value: 0.9736 |
|
|
name: Accuracy |
|
|
- type: f1 |
|
|
value: 0.9066 |
|
|
name: F1 (Macro) |
|
|
--- |
|
|
|
|
|
# BioGPT BI-RADS Classifier |
|
|
|
|
|
This model is a fine-tuned version of [microsoft/biogpt-large](https://huggingface.co/microsoft/biogpt-large) for BI-RADS classification of radiology reports. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
- **Base Model:** microsoft/biogpt-large |
|
|
- **Task:** Multi-class text classification (BI-RADS categories 0-6) |
|
|
- **Training Data:** Radiology reports with BI-RADS annotations |
|
|
- **Accuracy:** 97.36% |
|
|
- **F1-Score (Macro):** 90.66% |
|
|
|
|
|
## Performance |
|
|
|
|
|
### Overall Metrics |
|
|
- **Accuracy:** 97.36% |
|
|
- **F1-Score (Macro):** 90.66% |
|
|
- **F1-Score (Weighted):** 97.34% |
|
|
- **Precision (Macro):** 92.65% |
|
|
- **Recall (Macro):** 88.96% |
|
|
|
|
|
### Per-Class Performance |
|
|
| BI-RADS | Precision | Recall | F1-Score | Support | |
|
|
|---------|-----------|--------|----------|---------| |
|
|
| 0 | 0.9946 | 0.9482 | 0.9708 | 193 | |
|
|
| 1 | 0.9504 | 0.9664 | 0.9583 | 119 | |
|
|
| 2 | 0.9740 | 0.9943 | 0.9840 | 527 | |
|
|
| 3 | 1.0000 | 0.8333 | 0.9091 | 18 | |
|
|
| 4 | 0.9000 | 0.8182 | 0.8571 | 11 | |
|
|
| 5 | 0.6667 | 0.6667 | 0.6667 | 3 | |
|
|
| 6 | 1.0000 | 1.0000 | 1.0000 | 1 | |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, BioGptForSequenceClassification |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
model = BioGptForSequenceClassification.from_pretrained("ishro/biogpt-aura") |
|
|
tokenizer = AutoTokenizer.from_pretrained("ishro/biogpt-aura") |
|
|
|
|
|
# Prepare input |
|
|
report_text = "Your radiology report text here..." |
|
|
inputs = tokenizer(report_text, return_tensors="pt", padding=True, truncation=True, max_length=512) |
|
|
|
|
|
# Get prediction |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) |
|
|
predicted_class = torch.argmax(predictions, dim=-1).item() |
|
|
|
|
|
# Map to BI-RADS label |
|
|
birads_label = model.config.id2label[predicted_class] |
|
|
print(f"Predicted BI-RADS: {birads_label}") |
|
|
print(f"Confidence: {predictions[0][predicted_class].item():.4f}") |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Hyperparameters |
|
|
- **Learning Rate:** 2e-5 |
|
|
- **Batch Size:** 4 per device (2 GPUs) |
|
|
- **Gradient Accumulation Steps:** 8 |
|
|
- **Effective Batch Size:** 64 |
|
|
- **Epochs:** 3 |
|
|
- **Optimizer:** AdamW (fused) |
|
|
- **Mixed Precision:** BF16 |
|
|
- **Hardware:** 2x NVIDIA L40S (46GB each) |
|
|
|
|
|
### Training Data |
|
|
The model was trained on radiology reports with the following features: |
|
|
- Report observations |
|
|
- Conclusions |
|
|
- Recommendations |
|
|
- Patient metadata (age, hormonal therapy, family history, etc.) |
|
|
|
|
|
## Limitations |
|
|
- Performance on BI-RADS categories 5 and 6 is lower due to limited training samples |
|
|
- Model is trained on specific radiology report format |
|
|
- May not generalize well to reports from different institutions without fine-tuning |
|
|
|
|
|
## Ethical Considerations |
|
|
- This model is intended for research purposes and should not be used as the sole basis for clinical decisions |
|
|
- Always consult with qualified medical professionals for diagnosis and treatment |
|
|
- The model may have biases based on the training data distribution |
|
|
|
|
|
## Citation |
|
|
If you use this model, please cite: |
|
|
```bibtex |
|
|
@misc{biogpt-birads-classifier, |
|
|
author = {Your Name}, |
|
|
title = {BioGPT BI-RADS Classifier}, |
|
|
year = {2025}, |
|
|
publisher = {HuggingFace}, |
|
|
url = {https://huggingface.co/ishro/biogpt-aura} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Model Card Authors |
|
|
ishro |
|
|
|
|
|
## Model Card Contact |
|
|
For questions or issues, please open an issue on the model repository. |
|
|
|