vulonviing's picture
Upload best BABE fold checkpoint
9ae4976 verified
---
library_name: transformers
pipeline_tag: text-classification
base_model: roberta-base
tags:
- text-classification
- media-bias
- roberta
datasets:
- mediabiasgroup/BABE
language:
- en
---
# roberta-babe-baseline
Best-fold checkpoint from a 5-fold RoBERTa-base reproduction of BABE sentence-level media bias classification.
- Training code: [https://github.com/vulonviing/babe-roberta-baseline](https://github.com/vulonviing/babe-roberta-baseline)
- Source dataset: [https://huggingface.co/datasets/mediabiasgroup/BABE](https://huggingface.co/datasets/mediabiasgroup/BABE)
- Released checkpoint: `models/fold_0/checkpoint-532`
- Selected checkpoint: `fold_0` with macro-F1 `0.876`
- Summary statement: trained on 80% of BABE, 5-fold CV mean: `0.857 +- 0.012`
## Model details
| Item | Value |
|---|---|
| Base model | `roberta-base` |
| Task | Sentence-level media bias classification |
| Labels | `non-biased`, `biased` |
| Max sequence length | `128` |
| Epochs | `4` |
| Learning rate | `2e-05` |
| Batch size | `16` train / `32` eval |
| Weight decay | `0.01` |
| Warmup ratio | `0.1` |
| Random seed | `42` |
## Cross-validation summary
| Metric | Mean +- Std |
|---|---|
| Macro-F1 | 0.857 +- 0.012 |
| Accuracy | 0.858 +- 0.012 |
| Precision (macro) | 0.856 +- 0.011 |
| Recall (macro) | 0.859 +- 0.012 |
| Biased F1 | 0.869 +- 0.011 |
Per-fold macro-F1 values in the repo: `0.876, 0.854, 0.845, 0.852, 0.856`.
## Held-out quick-run reference
| Metric | Score |
|---|---|
| Macro-F1 | 0.870 |
| Accuracy | 0.872 |
| Precision (macro) | 0.870 |
| Recall (macro) | 0.872 |
| Biased F1 | 0.884 |
Confusion matrix from the held-out quick run (`n=468`):
| | Pred non-biased | Pred biased |
|---|---|---|
| True non-biased (207) | 180 | 27 |
| True biased (261) | 33 | 228 |
## Usage
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
repo_id = 'vulonviing/roberta-babe-baseline'
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)
```