SIP-BERT / README.md

miriamex

Update README.md

bc300ca verified 2 months ago

preview code

raw

history blame contribute delete

1.95 kB

metadata

license: cc-by-4.0
language:
  - de
base_model:
  - dbmdz/bert-base-german-cased
pipeline_tag: text-classification

SIP-BERT

SIP-BERT is a transformer-based model designed to detect social inequality in German texts.
It was fine-tuned on German Bundestag debates (sourced from OpenDiscourse), where each training instance consists of 3-sentence segments.

Model Description

Architecture: bert-base-german-cased (from dbmdz)
Task: Binary classification – detecting social inequality in German texts
Labels:
- 0 = no social inequality
- 1 = social inequality
Language: German
Training Data: 1,950 annotated text passages from Bundestag debates (via OpenDiscourse)
Segmenting: Data split into 3-sentence units
Evaluation: Accuracy 0.97; F1 Score 0.95

Intended Use

Primary use case: Analysis of parliamentary discourse on social inequality
Research contexts: Political science, computational social science, discourse analysis

Limitations

The model is trained on Bundestag debates (1949–2021), but is specialized for texts from 1990 onwards.
It may be less reliable for earlier parliamentary language (1949–1989) and for non-parliamentary speech.
It was designed primarily to detect economic inequality, and it may not be applicable to other types of inequality.

Usage

You can load the model with the Hugging Face transformers library:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("miriamex/SIP-BERT")
model = AutoModelForSequenceClassification.from_pretrained("miriamex/SIP-BERT")

inputs = tokenizer("Hier ein Beispieltext über soziale Ungleichheit.", return_tensors="pt")
outputs = model(**inputs)