vgainullin
/

citation_classifier

Text Classification

citation-detection

scientific-text

text-embeddings-inference

Model card Files Files and versions

citation_classifier / README.md

vgainullin's picture

Upload README.md with huggingface_hub

dc8d6cd verified 3 days ago

|

history blame contribute delete

1.39 kB

	---
	language: en
	license: mit
	tags:
	- text-classification
	- distilbert
	- biomedical
	- citation-detection
	- scientific-text
	datasets:
	- vgainullin/xciting_data
	metrics:
	- accuracy
	- f1
	pipeline_tag: text-classification
	---

	# Citation Classifier

	A DistilBERT-based binary classifier that identifies sentences in biomedical text that require citations.

	## Model Description

	This model takes a sentence from a scientific/biomedical article and predicts whether it should contain a citation (1) or not (0). It is a key component of the [pubciter](https://github.com/vgainullin/pubciter) pipeline for automated citation generation.

	Base model: distilbert-base-uncased
	Task: Binary text classification
	Domain: Biomedical / scientific literature

	## Variants

	- coteaching/ — Trained with co-teaching strategy for noise-robust learning
	- self_filtering/ — Trained with self-filtering for label noise reduction
	- last-checkpoint/ — Standard training final checkpoint

	## Training

	- Dataset: [vgainullin/xciting_data](https://huggingface.co/datasets/vgainullin/xciting_data) — PubMed sentences annotated for citation presence
	- Samples: 100k balanced (50k cited, 50k uncited)
	- Epochs: 3
	- Learning rate: 1e-6
	- Batch size: 16 (train), 64 (eval)
	- Optimizer: AdamW

	## Usage



	## Citation

	If you use this model, please cite: