CausalBERT
Collection
3 items
•
Updated
•
1
A multi-task model to extract causal attributions from German text.
Multi-Task Architecture: Extends EuroBERT-610m with token and relation classification heads, using a joint loss function :
| Task | Output Type | Labels / Classes |
|---|---|---|
| 1. Token Classification | Sequence Labeling (BIO) | 5 Span Labels (O, B-INDICATOR, I-INDICATOR, B-ENTITY, I-ENTITY) |
| 2. Relation Classification | Sentence-Pair Classification | 14 Relation Labels (e.g., MONO_POS_CAUSE, DIST_NEG_EFFECT) |
Dataset: 4,540 manually annotated relations (see the excerpt Bundestag Causal Attribution).
Find the implementation library here.
from causalbert.infer import load_model, sentence_analysis
model, tokenizer, config, device = load_model("pdjohn/C-EBERT")
sentences = ["Autoverkehr verursacht Bienensterben."]
analysis = sentence_analysis(model, tokenizer, config, sentences, batch_size=8)
print(analysis[0]['derived_relations'])
# Output: [(['Autoverkehr', 'verursacht'], ['Bienensterben']), {'label': 'MONO_POS_CAUSE', 'confidence': 0.954}]
## Evaluation & Performance
Evaluated on a stratified held-out test set of environmental discourse data.
<|parallel_sep|> token to handle sentence-pair classification for relation extraction.See train.py for the full configuration details.
Evaluated on a stratified held-out test set (1,135 Relations).
| Task | Accuracy | F1 (Macro/Micro) |
|---|---|---|
| Token Classification (BIO) | 0.879 | 0.783 (Micro) |
| Relation Classification | 0.732 | 0.425 (Macro) |
Base model
EuroBERT/EuroBERT-610m