| --- |
| library_name: transformers |
| pipeline_tag: text-classification |
| base_model: roberta-base |
| tags: |
| - text-classification |
| - media-bias |
| - roberta |
| datasets: |
| - mediabiasgroup/BABE |
| language: |
| - en |
| --- |
| |
| # roberta-babe-baseline |
|
|
| Best-fold checkpoint from a 5-fold RoBERTa-base reproduction of BABE sentence-level media bias classification. |
|
|
| - Training code: [https://github.com/vulonviing/babe-roberta-baseline](https://github.com/vulonviing/babe-roberta-baseline) |
| - Source dataset: [https://huggingface.co/datasets/mediabiasgroup/BABE](https://huggingface.co/datasets/mediabiasgroup/BABE) |
| - Released checkpoint: `models/fold_0/checkpoint-532` |
| - Selected checkpoint: `fold_0` with macro-F1 `0.876` |
| - Summary statement: trained on 80% of BABE, 5-fold CV mean: `0.857 +- 0.012` |
|
|
| ## Model details |
|
|
| | Item | Value | |
| |---|---| |
| | Base model | `roberta-base` | |
| | Task | Sentence-level media bias classification | |
| | Labels | `non-biased`, `biased` | |
| | Max sequence length | `128` | |
| | Epochs | `4` | |
| | Learning rate | `2e-05` | |
| | Batch size | `16` train / `32` eval | |
| | Weight decay | `0.01` | |
| | Warmup ratio | `0.1` | |
| | Random seed | `42` | |
|
|
| ## Cross-validation summary |
|
|
| | Metric | Mean +- Std | |
| |---|---| |
| | Macro-F1 | 0.857 +- 0.012 | |
| | Accuracy | 0.858 +- 0.012 | |
| | Precision (macro) | 0.856 +- 0.011 | |
| | Recall (macro) | 0.859 +- 0.012 | |
| | Biased F1 | 0.869 +- 0.011 | |
|
|
| Per-fold macro-F1 values in the repo: `0.876, 0.854, 0.845, 0.852, 0.856`. |
|
|
| ## Held-out quick-run reference |
|
|
| | Metric | Score | |
| |---|---| |
| | Macro-F1 | 0.870 | |
| | Accuracy | 0.872 | |
| | Precision (macro) | 0.870 | |
| | Recall (macro) | 0.872 | |
| | Biased F1 | 0.884 | |
|
|
| Confusion matrix from the held-out quick run (`n=468`): |
|
|
| | | Pred non-biased | Pred biased | |
| |---|---|---| |
| | True non-biased (207) | 180 | 27 | |
| | True biased (261) | 33 | 228 | |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoModelForSequenceClassification, AutoTokenizer |
| |
| repo_id = 'vulonviing/roberta-babe-baseline' |
| tokenizer = AutoTokenizer.from_pretrained(repo_id) |
| model = AutoModelForSequenceClassification.from_pretrained(repo_id) |
| ``` |
|
|