Citation Classifier
A DistilBERT-based binary classifier that identifies sentences in biomedical text that require citations.
Model Description
This model takes a sentence from a scientific/biomedical article and predicts whether it should contain a citation (1) or not (0). It is a key component of the pubciter pipeline for automated citation generation.
Base model: distilbert-base-uncased
Task: Binary text classification
Domain: Biomedical / scientific literature
Variants
- coteaching/ — Trained with co-teaching strategy for noise-robust learning
- self_filtering/ — Trained with self-filtering for label noise reduction
- last-checkpoint/ — Standard training final checkpoint
Training
- Dataset: vgainullin/xciting_data — PubMed sentences annotated for citation presence
- Samples: 100k balanced (50k cited, 50k uncited)
- Epochs: 3
- Learning rate: 1e-6
- Batch size: 16 (train), 64 (eval)
- Optimizer: AdamW
Usage
Citation
If you use this model, please cite:
- Downloads last month
- 5