|
|
--- |
|
|
language: en |
|
|
library_name: transformers |
|
|
pipeline_tag: text-classification |
|
|
license: mit |
|
|
tags: |
|
|
- sentiment-analysis |
|
|
- distilbert |
|
|
- sequence-classification |
|
|
- academic-peer-review |
|
|
- openreview |
|
|
datasets: |
|
|
- nhop/OpenReview |
|
|
base_model: |
|
|
- distilbert/distilbert-base-uncased |
|
|
--- |
|
|
|
|
|
# Academic Sentiment Classifier (DistilBERT) |
|
|
|
|
|
DistilBERT-based sequence classification model that predicts the sentiment polarity of academic peer-review text (binary: negative vs positive). It supports research on evaluating the sentiment of scholarly reviews and AI-generated critique, enabling large-scale, reproducible measurements for academic-style content. |
|
|
|
|
|
## Model details |
|
|
|
|
|
- Architecture: DistilBERT for Sequence Classification (2 labels) |
|
|
- Max input length used during training: 512 tokens |
|
|
- Labels: |
|
|
- LABEL_0 -> negative |
|
|
- LABEL_1 -> positive |
|
|
- Format: `safetensors` |
|
|
|
|
|
## Intended uses & limitations |
|
|
|
|
|
Intended uses: |
|
|
|
|
|
- Analyze sentiment of peer-review snippets, full reviews, or similar scholarly discourse. |
|
|
|
|
|
Limitations: |
|
|
|
|
|
- Binary polarity only (no neutral class); confidence scores should be interpreted with care. |
|
|
- Domain-specific: optimized for academic review-style English text; may underperform on general-domain data. |
|
|
- Not a replacement for human judgement or editorial decision-making. |
|
|
|
|
|
Ethical considerations and bias: |
|
|
|
|
|
- Scholarly reviews can contain technical jargon, hedging, and nuanced tone; polarity is an imperfect proxy for quality or fairness. |
|
|
- Potential biases may reflect those present in the underlying corpus. |
|
|
|
|
|
## Training data |
|
|
|
|
|
The model was fine-tuned on a corpus of academic peer-review text curated from OpenReview review texts. The task is binary sentiment classification over review text spans. |
|
|
|
|
|
Note: If you plan to use or extend the underlying data, please review the terms of use for OpenReview and any relevant dataset licenses. |
|
|
|
|
|
## Training procedure (high level) |
|
|
|
|
|
- Base model: DistilBERT (transformers) |
|
|
- Objective: single-label binary classification |
|
|
- Tokenization: standard DistilBERT tokenizer, truncation to 512 tokens |
|
|
- Optimizer/scheduler: standard Trainer defaults (AdamW with linear schedule) |
|
|
|
|
|
Exact hyperparameters may vary across runs; typical training uses AdamW with a linear learning rate schedule and truncation to 512 tokens. |
|
|
|
|
|
## How to use |
|
|
|
|
|
Basic pipeline usage: |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
clf = pipeline( |
|
|
task="text-classification", |
|
|
model="EvilScript/academic-sentiment-classifier", |
|
|
tokenizer="EvilScript/academic-sentiment-classifier", |
|
|
return_all_scores=False, |
|
|
) |
|
|
|
|
|
text = "The paper is clearly written and provides strong empirical support for the claims." |
|
|
print(clf(text)) |
|
|
# Example output: [{'label': 'LABEL_1', 'score': 0.97}] # LABEL_1 -> positive |
|
|
``` |
|
|
|
|
|
If you prefer friendly labels, you can map them: |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
id2name = {"LABEL_0": "negative", "LABEL_1": "positive"} |
|
|
clf = pipeline("text-classification", model="EvilScript/academic-sentiment-classifier") |
|
|
res = clf("This section lacks clarity and the experiments are inconclusive.")[0] |
|
|
res["label"] = id2name.get(res["label"], res["label"]) # map to human-friendly label |
|
|
print(res) |
|
|
``` |
|
|
|
|
|
Batch inference: |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
device = 0 if torch.cuda.is_available() else -1 |
|
|
tok = AutoTokenizer.from_pretrained("EvilScript/academic-sentiment-classifier") |
|
|
model = AutoModelForSequenceClassification.from_pretrained("EvilScript/academic-sentiment-classifier") |
|
|
|
|
|
texts = [ |
|
|
"I recommend acceptance; the methodology is solid and results are convincing.", |
|
|
"Major concerns remain; the evaluation is incomplete and unclear.", |
|
|
] |
|
|
|
|
|
inputs = tok(texts, padding=True, truncation=True, max_length=512, return_tensors="pt") |
|
|
with torch.no_grad(): |
|
|
logits = model(**inputs).logits |
|
|
probs = torch.softmax(logits, dim=-1) |
|
|
pred_ids = probs.argmax(dim=-1) |
|
|
|
|
|
# Map to friendly labels |
|
|
id2name = {0: "negative", 1: "positive"} |
|
|
preds = [id2name[i.item()] for i in pred_ids] |
|
|
print(list(zip(texts, preds))) |
|
|
``` |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
If you compute new metrics on public datasets or benchmarks, consider sharing them via a pull request to this model card. |
|
|
|
|
|
## License |
|
|
|
|
|
The model weights and card are released under the MIT license. Review and comply with any third-party data licenses if reusing the training data. |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite the project: |
|
|
|
|
|
```bibtex |
|
|
@misc{federico_torrielli_2025, |
|
|
author = { Federico Torrielli and Stefano Locci }, |
|
|
title = { academic-sentiment-classifier }, |
|
|
year = 2025, |
|
|
url = { https://huggingface.co/EvilScript/academic-sentiment-classifier }, |
|
|
doi = { 10.57967/hf/6535 }, |
|
|
publisher = { Hugging Face } |
|
|
} |
|
|
``` |