Update README.md
Browse files
README.md
CHANGED
|
@@ -13,7 +13,7 @@ This is the scene segmenter model that is being used in [LLpro](https://github.c
|
|
| 13 |
Broadly speaking, the model is being used in a token classification setup. A sequence of multiple sentences is represented by interspersing the respective tokenizations with the special `[SEP]` token.
|
| 14 |
On these `[SEP]` tokens, the linear classification layer predicts one of the four above classes.
|
| 15 |
|
| 16 |
-
The model is trained on the dataset corresponding to the [KONVENS 2021 Shared Task on Scene Segmentation](http://lsx-events.informatik.uni-wuerzburg.de/stss-2021/task.html) [
|
| 17 |
|
| 18 |
F1-Score:
|
| 19 |
- **40.22** on Track 1 (in-domain dime novels)
|
|
@@ -26,7 +26,55 @@ The respective test datasets are only available to the task organizers; the task
|
|
| 26 |
**Demo Usage**:
|
| 27 |
|
| 28 |
```python
|
| 29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
```
|
| 31 |
|
| 32 |
|
|
|
|
| 13 |
Broadly speaking, the model is being used in a token classification setup. A sequence of multiple sentences is represented by interspersing the respective tokenizations with the special `[SEP]` token.
|
| 14 |
On these `[SEP]` tokens, the linear classification layer predicts one of the four above classes.
|
| 15 |
|
| 16 |
+
The model is trained on the dataset corresponding to the [KONVENS 2021 Shared Task on Scene Segmentation](http://lsx-events.informatik.uni-wuerzburg.de/stss-2021/task.html) ([Zehe et al., 2021](http://ceur-ws.org/Vol-3001/#paper1)) fine-tuning the domain-adapted [lkonle/fiction-gbert-large](https://huggingface.co/lkonle/fiction-gbert-large). ([Training code](https://github.com/cophi-wue/LLpro/blob/main/contrib/train_scene_segmenter.py))
|
| 17 |
|
| 18 |
F1-Score:
|
| 19 |
- **40.22** on Track 1 (in-domain dime novels)
|
|
|
|
| 26 |
**Demo Usage**:
|
| 27 |
|
| 28 |
```python
|
| 29 |
+
import torch
|
| 30 |
+
from transformers import BertTokenizer, BertForTokenClassification
|
| 31 |
+
|
| 32 |
+
tokenizer = BertTokenizer.from_pretrained('aehrm/stss-scene-segmenter')
|
| 33 |
+
model = BertForTokenClassification.from_pretrained('aehrm/stss-scene-segmenter', sep_token_id=tokenizer.sep_token_id).eval()
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
sentences = ['Und so begann unser kleines Abenteuer auf Hoher See...', 'Es war früh am Morgen, als wir in See stechen wollten.', 'Das Wasser war still.']
|
| 37 |
+
inputs = tokenizer(' [SEP] '.join(sentences), return_tensors='pt')
|
| 38 |
+
|
| 39 |
+
# inference on the model
|
| 40 |
+
with torch.no_grad():
|
| 41 |
+
logits = model(**inputs).logits
|
| 42 |
+
|
| 43 |
+
# concentrate on the logits corresponding to the [SEP] tokens
|
| 44 |
+
relevant_logits = logits[inputs.input_ids == tokenizer.sep_token_id]
|
| 45 |
+
|
| 46 |
+
predicted_ids = relevant_logits.argmax(axis=1).numpy()
|
| 47 |
+
predicted_labels = [ model.config.id2label[x] for x in predicted_ids ]
|
| 48 |
+
|
| 49 |
+
# print the associated prediction for each sentence / [CLS] token
|
| 50 |
+
for label, sent in zip(predicted_labels, sentences):
|
| 51 |
+
print(label, sent)
|
| 52 |
+
# >>> Scene Und so begann unser kleines Abenteuer auf Hoher See...
|
| 53 |
+
# >>> Scene-B Es war früh am Morgen, als wir in See stechen wollten. (This sentence begins a new scene.)
|
| 54 |
+
# >>> Scene Das Wasser war still.
|
| 55 |
+
|
| 56 |
+
# alternatively, decode the respective bridge type
|
| 57 |
+
prev = None
|
| 58 |
+
for label, sent in zip(predicted_labels, sentences):
|
| 59 |
+
bridge = None
|
| 60 |
+
if prev == 'Scene' and label == 'Scene-B':
|
| 61 |
+
bridge = 'SCENE-TO-SCENE'
|
| 62 |
+
elif prev == 'Scene' and label == 'Nonscene-B':
|
| 63 |
+
bridge = 'SCENE-TO-NONSCENE'
|
| 64 |
+
elif prev == 'Nonscene' and label == 'Scene-B':
|
| 65 |
+
bridge = 'NONSCENE-TO-SCENE'
|
| 66 |
+
else:
|
| 67 |
+
bridge = 'NOBORDER'
|
| 68 |
+
|
| 69 |
+
if prev is not None:
|
| 70 |
+
print(bridge)
|
| 71 |
+
print(sent)
|
| 72 |
+
prev = label
|
| 73 |
+
# >>> Und so begann unser kleines Abenteuer auf Hoher See...
|
| 74 |
+
# >>> SCENE-TO-SCENE
|
| 75 |
+
# >>> Es war früh am Morgen, als wir in See stechen wollten.
|
| 76 |
+
# >>> NOBORDER
|
| 77 |
+
# >>> Das Wasser war still.
|
| 78 |
```
|
| 79 |
|
| 80 |
|