citation_classifier / README.md
vgainullin's picture
Upload README.md with huggingface_hub
dc8d6cd verified
metadata
language: en
license: mit
tags:
  - text-classification
  - distilbert
  - biomedical
  - citation-detection
  - scientific-text
datasets:
  - vgainullin/xciting_data
metrics:
  - accuracy
  - f1
pipeline_tag: text-classification

Citation Classifier

A DistilBERT-based binary classifier that identifies sentences in biomedical text that require citations.

Model Description

This model takes a sentence from a scientific/biomedical article and predicts whether it should contain a citation (1) or not (0). It is a key component of the pubciter pipeline for automated citation generation.

Base model: distilbert-base-uncased
Task: Binary text classification
Domain: Biomedical / scientific literature

Variants

  • coteaching/ — Trained with co-teaching strategy for noise-robust learning
  • self_filtering/ — Trained with self-filtering for label noise reduction
  • last-checkpoint/ — Standard training final checkpoint

Training

  • Dataset: vgainullin/xciting_data — PubMed sentences annotated for citation presence
  • Samples: 100k balanced (50k cited, 50k uncited)
  • Epochs: 3
  • Learning rate: 1e-6
  • Batch size: 16 (train), 64 (eval)
  • Optimizer: AdamW

Usage

Citation

If you use this model, please cite: