SciBERT Fine-tuned for SciCite Intent Classification

This model is a fine-tuned version of allenai/scibert_scivocab_uncased on the SciCite dataset.

Model Description

Base Model: AllenAI's SciBERT pre-trained on scientific text with a domain-specific vocabulary.

Task: Citation Intent Classification — predicting why an author is citing another work (background, method, or result).

Labels:

background (0): Citations providing prior work context
method (1): Citations of techniques or methodologies
result (2): Citations comparing or contrasting experimental results

Results

Achieved on the SciCite test set:

Metric	Score
Accuracy	85.60%
Macro F1	0.8431
Weighted F1	0.8566

Per-class performance:

Class	Precision	Recall	F1-Score	Support
background	0.88	0.87	0.88	997
method	0.88	0.81	0.84	605
result	0.74	0.89	0.81	259

Intended Uses & Limitations

Intended Use: Automatically classify citation intents in academic papers to improve literature mining, knowledge graph construction, and semantic search applications.

Limitations: Model trained on arXiv scientific abstracts; may not generalize to other domains (biomedical, legal, etc.). Best performance on background/method classes; result class has lower precision due to class imbalance.

Training and Evaluation Data

Dataset: SciCite — 8,243 training examples, 916 validation, 1,861 test (citation contexts from arXiv papers).

Format: Citation sentence + class label. Max length: 256 tokens. Split: 80% train, 10% val, 10% test.

How to Use

Installation

pip install transformers torch

Inference

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("lostelf/scibert_scicite_finetuned")
model = AutoModelForSequenceClassification.from_pretrained("lostelf/scibert_scicite_finetuned")

text = "We use the BERT architecture as in Devlin et al."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)

with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    predicted_class = logits.argmax(dim=-1).item()

labels = {0: "background", 1: "method", 2: "result"}
print(f"Predicted: {labels[predicted_class]}")

Batch Prediction

texts = [
    "We build on the transformer framework introduced by Vaswani et al.",
    "Our implementation follows the optimization procedure in Kingma & Ba.",
    "These results exceed prior work by Devlin et al. (BERT)."
]

inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=256)
outputs = model(**inputs)
predictions = outputs.logits.argmax(dim=-1)
print([labels[p] for p in predictions])

Training Hyperparameters

Parameter	Value
Model	allenai/scibert_scivocab_uncased
Epochs	8
Batch Size	32
Learning Rate	1e-05
Warmup Steps	25 (~10% of training)
Weight Decay	0.01
Optimizer	AdamW
LR Scheduler	linear
Gradient Accumulation	2 steps
FP16	Enabled

Training Results

Best checkpoint: Epoch 8 (macro F1 = 0.8431). Early stopping patience = 4 epochs.

Training curve (eval epochs):

Epoch | Train Loss | Val Loss | Macro F1 | Micro F1
------|------------|----------|----------|----------
  1 | 0.7104     | 0.4155     | 0.8143   | 0.8395  
  2 | 0.5514     | 0.4304     | 0.8176   | 0.8428  
  3 | 0.4605     | 0.4256     | 0.8288   | 0.8504  
  4 | 0.3829     | 0.4514     | 0.8310   | 0.8515  
  5 | 0.3176     | 0.4908     | 0.8311   | 0.8537  
  6 | 0.2678     | 0.5162     | 0.8334   | 0.8548  
  7 | 0.2288     | 0.5507     | 0.8351   | 0.8548  
  8 | N/A        | 0.5648     | 0.8275   | 0.8493

Framework Versions

Python: 3.11
PyTorch: 2.0+
Transformers: 4.38+
Datasets: 2.14+
Scikit-learn: 1.3+

Generated: 2026-04-14 22:24:37
Training GPU: 2

Downloads last month: 367

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

Accuracy on scicite
self-reported

0.856
Macro F1 on scicite
self-reported

0.843