|
|
--- |
|
|
datasets: |
|
|
- CausalNewsCorpus |
|
|
language: en |
|
|
library_name: transformers |
|
|
license: mit |
|
|
metrics: |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
tags: |
|
|
- token-classification |
|
|
- roberta |
|
|
- causal-narrative |
|
|
- cause-effect-extraction |
|
|
- span-extraction |
|
|
- ner |
|
|
--- |
|
|
|
|
|
# RoBERTa Causal Span Extractor |
|
|
|
|
|
This model is a fine-tuned version of `roberta-base` for **causal span extraction** |
|
|
(token classification). It identifies **cause** and **effect** text spans in sentences. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
- **Base Model**: roberta-base |
|
|
- **Task**: Token classification (BIO tagging) |
|
|
- **Labels**: O, B-CAUSE, I-CAUSE, B-EFFECT, I-EFFECT |
|
|
- **Training Data**: CausalNewsCorpus V2 (sentences with exactly 1 causal pair) |
|
|
- **Training Samples**: 1105 |
|
|
- **Dev Samples**: 133 |
|
|
|
|
|
## Training Results |
|
|
|
|
|
See the training notebook for detailed metrics. |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import RobertaTokenizerFast, RobertaForTokenClassification |
|
|
import torch |
|
|
|
|
|
model_name = "causal-narrative/roberta-causal-span-extractor" |
|
|
tokenizer = RobertaTokenizerFast.from_pretrained(model_name, add_prefix_space=True) |
|
|
model = RobertaForTokenClassification.from_pretrained(model_name) |
|
|
|
|
|
text = "The heavy rain caused flooding in the city." |
|
|
words = text.split() |
|
|
inputs = tokenizer(words, is_split_into_words=True, return_tensors="pt", |
|
|
truncation=True, padding=True) |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
preds = torch.argmax(outputs.logits, dim=2)[0] |
|
|
|
|
|
id2label = model.config.id2label |
|
|
word_ids = tokenizer(words, is_split_into_words=True).word_ids() |
|
|
prev = None |
|
|
for wid in word_ids: |
|
|
if wid is not None and wid != prev: |
|
|
print(f"{words[wid]:20s} {id2label[preds[word_ids.index(wid)].item()]}") |
|
|
prev = wid |
|
|
``` |
|
|
|
|
|
## Labels |
|
|
|
|
|
| Label | Description | |
|
|
|-------|-------------| |
|
|
| O | Non-causal token | |
|
|
| B-CAUSE | Beginning of cause span | |
|
|
| I-CAUSE | Inside cause span | |
|
|
| B-EFFECT | Beginning of effect span | |
|
|
| I-EFFECT | Inside effect span | |
|
|
|