File size: 2,440 Bytes
c039850
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e8fa6c6
c039850
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
license: cc-by-nc-4.0
pipeline_tag: text-classification
library_name: transformers
language: [en]
tags:
- media-bias
- lexical-bias
- babe
- paper:2209.14557
datasets:
- mediabiasgroup/BABE
base_model: roberta-base
---

# RoBERTa — BABE — HA-FT

This repository provides a **RoBERTa-base** model fine-tuned on the **BABE (Bias Annotations By Experts)** dataset for **sentence-level lexical/loaded-language bias** detection in English news text. BABE was introduced in the paper [*Neural Media Bias Detection Using Distant Supervision With BABE – Bias Annotations By Experts*](https://arxiv.org/abs/2209.14557).

**Labels**
- `0` → neutral / non-lexical-bias  
- `1` → lexical-bias

## Intended use & limitations
- **Intended use:** research and benchmarking of **lexical bias** at the sentence level on news-like English text.
- **Out-of-scope:** detection of informational/selection bias, stance, political leaning, or factuality; production deployments without human oversight.

## How to use
```python
    from transformers import AutoTokenizer, AutoModelForSequenceClassification
    m = "mediabiasgroup/roberta-babe-ft"
    tok = AutoTokenizer.from_pretrained(m)
    model = AutoModelForSequenceClassification.from_pretrained(m)

    text = "Democrats shamelessly rammed the bill through Congress."
    probs = model(**tok(text, return_tensors="pt")).logits.softmax(-1).tolist()[0]
    print({"neutral": probs[0], "lexical_bias": probs[1]})
```

## Training data & setup
- **Data:** BABE (expert-annotated, sentence-level lexical bias).  
- **Backbone:** `roberta-base` with a standard sequence-classification head.  
- **Training:** single-run fine-tuning; standard hyperparameters (update with your exact config if desired).

## Safety, bias & ethics
Media-bias perception is subjective and context-dependent. This model may **over-flag** emotionally charged wording. Keep a **human in the loop** and avoid punitive or outlet-level decisions without careful validation.

## Citation
If you use this model or the dataset, please cite:

```bibtex
@article{spinde2022neural,
  title   = {Neural Media Bias Detection Using Distant Supervision With BABE -- Bias Annotations By Experts},
  author  = {Spinde, Timo and Plank, Manuel and Krieger, Jan-David and Ruas, Terry and Gipp, Bela and Aizawa, Akiko},
  journal = {arXiv preprint arXiv:2209.14557},
  year    = {2022},
  url     = {https://arxiv.org/abs/2209.14557}
}
```