|
|
--- |
|
|
license: cc-by-nc-4.0 |
|
|
pipeline_tag: text-classification |
|
|
library_name: transformers |
|
|
language: [en] |
|
|
tags: |
|
|
- media-bias |
|
|
- lexical-bias |
|
|
- babe |
|
|
- paper:2209.14557 |
|
|
datasets: |
|
|
- mediabiasgroup/BABE |
|
|
base_model: roberta-base |
|
|
--- |
|
|
|
|
|
# RoBERTa β BABE β HA-FT |
|
|
|
|
|
This repository provides a **RoBERTa-base** model fine-tuned on the **BABE (Bias Annotations By Experts)** dataset for **sentence-level lexical/loaded-language bias** detection in English news text. BABE was introduced in the paper [*Neural Media Bias Detection Using Distant Supervision With BABE β Bias Annotations By Experts*](https://arxiv.org/abs/2209.14557). |
|
|
|
|
|
**Labels** |
|
|
- `0` β neutral / non-lexical-bias |
|
|
- `1` β lexical-bias |
|
|
|
|
|
## Intended use & limitations |
|
|
- **Intended use:** research and benchmarking of **lexical bias** at the sentence level on news-like English text. |
|
|
- **Out-of-scope:** detection of informational/selection bias, stance, political leaning, or factuality; production deployments without human oversight. |
|
|
|
|
|
## How to use |
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
m = "mediabiasgroup/roberta-babe-ft" |
|
|
tok = AutoTokenizer.from_pretrained(m) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(m) |
|
|
|
|
|
text = "Democrats shamelessly rammed the bill through Congress." |
|
|
probs = model(**tok(text, return_tensors="pt")).logits.softmax(-1).tolist()[0] |
|
|
print({"neutral": probs[0], "lexical_bias": probs[1]}) |
|
|
``` |
|
|
|
|
|
## Training data & setup |
|
|
- **Data:** BABE (expert-annotated, sentence-level lexical bias). |
|
|
- **Backbone:** `roberta-base` with a standard sequence-classification head. |
|
|
- **Training:** single-run fine-tuning; standard hyperparameters (update with your exact config if desired). |
|
|
|
|
|
## Safety, bias & ethics |
|
|
Media-bias perception is subjective and context-dependent. This model may **over-flag** emotionally charged wording. Keep a **human in the loop** and avoid punitive or outlet-level decisions without careful validation. |
|
|
|
|
|
## Citation |
|
|
If you use this model or the dataset, please cite: |
|
|
|
|
|
```bibtex |
|
|
@article{spinde2022neural, |
|
|
title = {Neural Media Bias Detection Using Distant Supervision With BABE -- Bias Annotations By Experts}, |
|
|
author = {Spinde, Timo and Plank, Manuel and Krieger, Jan-David and Ruas, Terry and Gipp, Bela and Aizawa, Akiko}, |
|
|
journal = {arXiv preprint arXiv:2209.14557}, |
|
|
year = {2022}, |
|
|
url = {https://arxiv.org/abs/2209.14557} |
|
|
} |
|
|
``` |