SciBERT Fine-tuned for SciCite Intent Classification
This model is a fine-tuned version of allenai/scibert_scivocab_uncased on the SciCite dataset.
Model Description
Base Model: AllenAI's SciBERT pre-trained on scientific text with a domain-specific vocabulary.
Task: Citation Intent Classification โ predicting why an author is citing another work (background, method, or result).
Labels:
background(0): Citations providing prior work contextmethod(1): Citations of techniques or methodologiesresult(2): Citations comparing or contrasting experimental results
Results
Achieved on the SciCite test set:
| Metric | Score |
|---|---|
| Accuracy | 85.60% |
| Macro F1 | 0.8431 |
| Weighted F1 | 0.8566 |
Per-class performance:
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| background | 0.88 | 0.87 | 0.88 | 997 |
| method | 0.88 | 0.81 | 0.84 | 605 |
| result | 0.74 | 0.89 | 0.81 | 259 |
Intended Uses & Limitations
Intended Use: Automatically classify citation intents in academic papers to improve literature mining, knowledge graph construction, and semantic search applications.
Limitations: Model trained on arXiv scientific abstracts; may not generalize to other domains (biomedical, legal, etc.). Best performance on background/method classes; result class has lower precision due to class imbalance.
Training and Evaluation Data
Dataset: SciCite โ 8,243 training examples, 916 validation, 1,861 test (citation contexts from arXiv papers).
Format: Citation sentence + class label. Max length: 256 tokens. Split: 80% train, 10% val, 10% test.
How to Use
Installation
pip install transformers torch
Inference
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("lostelf/scibert_scicite_finetuned")
model = AutoModelForSequenceClassification.from_pretrained("lostelf/scibert_scicite_finetuned")
text = "We use the BERT architecture as in Devlin et al."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predicted_class = logits.argmax(dim=-1).item()
labels = {0: "background", 1: "method", 2: "result"}
print(f"Predicted: {labels[predicted_class]}")
Batch Prediction
texts = [
"We build on the transformer framework introduced by Vaswani et al.",
"Our implementation follows the optimization procedure in Kingma & Ba.",
"These results exceed prior work by Devlin et al. (BERT)."
]
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=256)
outputs = model(**inputs)
predictions = outputs.logits.argmax(dim=-1)
print([labels[p] for p in predictions])
Training Hyperparameters
| Parameter | Value |
|---|---|
| Model | allenai/scibert_scivocab_uncased |
| Epochs | 8 |
| Batch Size | 32 |
| Learning Rate | 1e-05 |
| Warmup Steps | 25 (~10% of training) |
| Weight Decay | 0.01 |
| Optimizer | AdamW |
| LR Scheduler | linear |
| Gradient Accumulation | 2 steps |
| FP16 | Enabled |
Training Results
Best checkpoint: Epoch 8 (macro F1 = 0.8431). Early stopping patience = 4 epochs.
Training curve (eval epochs):
Epoch | Train Loss | Val Loss | Macro F1 | Micro F1
------|------------|----------|----------|----------
1 | 0.7104 | 0.4155 | 0.8143 | 0.8395
2 | 0.5514 | 0.4304 | 0.8176 | 0.8428
3 | 0.4605 | 0.4256 | 0.8288 | 0.8504
4 | 0.3829 | 0.4514 | 0.8310 | 0.8515
5 | 0.3176 | 0.4908 | 0.8311 | 0.8537
6 | 0.2678 | 0.5162 | 0.8334 | 0.8548
7 | 0.2288 | 0.5507 | 0.8351 | 0.8548
8 | N/A | 0.5648 | 0.8275 | 0.8493
Framework Versions
- Python: 3.11
- PyTorch: 2.0+
- Transformers: 4.38+
- Datasets: 2.14+
- Scikit-learn: 1.3+
Generated: 2026-04-14 22:24:37
Training GPU: 2
- Downloads last month
- 367
Evaluation results
- Accuracy on sciciteself-reported0.856
- Macro F1 on sciciteself-reported0.843