miriamex
/

SIP-BERT

Text Classification

Model card Files Files and versions

SIP-BERT / README.md

miriamex's picture

Update README.md

bc300ca verified 2 months ago

|

history blame contribute delete

1.95 kB

	---
	license: cc-by-4.0
	language:
	- de
	base_model:
	- dbmdz/bert-base-german-cased
	pipeline_tag: text-classification
	---

	# SIP-BERT

	SIP-BERT is a transformer-based model designed to detect social inequality in German texts.
	It was fine-tuned on German Bundestag debates (sourced from [OpenDiscourse](https://doi.org/10.7910/DVN/FIKIBO)), where each training instance consists of 3-sentence segments.

	---

	## Model Description

	- Architecture: `bert-base-german-cased` (from [dbmdz](https://huggingface.co/dbmdz/bert-base-german-cased))
	- Task: Binary classification – detecting social inequality in German texts
	- Labels:
	- `0` = no social inequality
	- `1` = social inequality
	- Language: German
	- Training Data: 1,950 annotated text passages from Bundestag debates (via OpenDiscourse)
	- Segmenting: Data split into 3-sentence units
	- Evaluation: Accuracy 0.97; F1 Score 0.95

	---

	## Intended Use

	- Primary use case: Analysis of parliamentary discourse on social inequality
	- Research contexts: Political science, computational social science, discourse analysis

	---

	## Limitations

	- The model is trained on Bundestag debates (1949–2021), but is specialized for texts from 1990 onwards.
	- It may be less reliable for earlier parliamentary language (1949–1989) and for non-parliamentary speech.
	- It was designed primarily to detect economic inequality, and it may not be applicable to other types of inequality.

	---

	## Usage

	You can load the model with the Hugging Face `transformers` library:

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	tokenizer = AutoTokenizer.from_pretrained("miriamex/SIP-BERT")
	model = AutoModelForSequenceClassification.from_pretrained("miriamex/SIP-BERT")

	inputs = tokenizer("Hier ein Beispieltext über soziale Ungleichheit.", return_tensors="pt")
	outputs = model(**inputs)
	```