You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Citation Classifier

A DistilBERT-based binary classifier that identifies sentences in biomedical text that require citations.

Model Description

This model takes a sentence from a scientific/biomedical article and predicts whether it should contain a citation (1) or not (0). It is a key component of the pubciter pipeline for automated citation generation.

Base model: distilbert-base-uncased
Task: Binary text classification
Domain: Biomedical / scientific literature

Variants

  • coteaching/ — Trained with co-teaching strategy for noise-robust learning
  • self_filtering/ — Trained with self-filtering for label noise reduction
  • last-checkpoint/ — Standard training final checkpoint

Training

  • Dataset: vgainullin/xciting_data — PubMed sentences annotated for citation presence
  • Samples: 100k balanced (50k cited, 50k uncited)
  • Epochs: 3
  • Learning rate: 1e-6
  • Batch size: 16 (train), 64 (eval)
  • Optimizer: AdamW

Usage

Citation

If you use this model, please cite:

Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train vgainullin/citation_classifier