|
|
--- |
|
|
license: cc-by-4.0 |
|
|
language: |
|
|
- de |
|
|
base_model: |
|
|
- dbmdz/bert-base-german-cased |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
|
|
|
# SIP-BERT |
|
|
|
|
|
**SIP-BERT** is a transformer-based model designed to detect **social inequality** in German texts. |
|
|
It was fine-tuned on **German Bundestag debates** (sourced from [OpenDiscourse](https://doi.org/10.7910/DVN/FIKIBO)), where each training instance consists of 3-sentence segments. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Description |
|
|
|
|
|
- **Architecture**: `bert-base-german-cased` (from [dbmdz](https://huggingface.co/dbmdz/bert-base-german-cased)) |
|
|
- **Task**: Binary classification – detecting social inequality in German texts |
|
|
- **Labels**: |
|
|
- `0` = no social inequality |
|
|
- `1` = social inequality |
|
|
- **Language**: German |
|
|
- **Training Data**: 1,950 annotated text passages from Bundestag debates (via OpenDiscourse) |
|
|
- **Segmenting**: Data split into 3-sentence units |
|
|
- **Evaluation**: Accuracy 0.97; F1 Score 0.95 |
|
|
|
|
|
--- |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
- **Primary use case**: Analysis of parliamentary discourse on social inequality |
|
|
- **Research contexts**: Political science, computational social science, discourse analysis |
|
|
|
|
|
--- |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- The model is trained on Bundestag debates (1949–2021), but is **specialized for texts from 1990 onwards**. |
|
|
- It may be less reliable for earlier parliamentary language (1949–1989) and for **non-parliamentary speech**. |
|
|
- It was designed primarily to detect **economic inequality**, and it may not be applicable to other types of inequality. |
|
|
|
|
|
--- |
|
|
|
|
|
## Usage |
|
|
|
|
|
You can load the model with the Hugging Face `transformers` library: |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("miriamex/SIP-BERT") |
|
|
model = AutoModelForSequenceClassification.from_pretrained("miriamex/SIP-BERT") |
|
|
|
|
|
inputs = tokenizer("Hier ein Beispieltext über soziale Ungleichheit.", return_tensors="pt") |
|
|
outputs = model(**inputs) |
|
|
``` |