citation_classifier / README.md
vgainullin's picture
Upload README.md with huggingface_hub
dc8d6cd verified
---
language: en
license: mit
tags:
- text-classification
- distilbert
- biomedical
- citation-detection
- scientific-text
datasets:
- vgainullin/xciting_data
metrics:
- accuracy
- f1
pipeline_tag: text-classification
---
# Citation Classifier
A DistilBERT-based binary classifier that identifies sentences in biomedical text that require citations.
## Model Description
This model takes a sentence from a scientific/biomedical article and predicts whether it should contain a citation (1) or not (0). It is a key component of the [pubciter](https://github.com/vgainullin/pubciter) pipeline for automated citation generation.
**Base model:** distilbert-base-uncased
**Task:** Binary text classification
**Domain:** Biomedical / scientific literature
## Variants
- **coteaching/** — Trained with co-teaching strategy for noise-robust learning
- **self_filtering/** — Trained with self-filtering for label noise reduction
- **last-checkpoint/** — Standard training final checkpoint
## Training
- **Dataset:** [vgainullin/xciting_data](https://huggingface.co/datasets/vgainullin/xciting_data) — PubMed sentences annotated for citation presence
- **Samples:** 100k balanced (50k cited, 50k uncited)
- **Epochs:** 3
- **Learning rate:** 1e-6
- **Batch size:** 16 (train), 64 (eval)
- **Optimizer:** AdamW
## Usage
## Citation
If you use this model, please cite: