Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# Scene Segmenter for the Shared Task on Scene Segmentation
|
| 2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- de
|
| 4 |
+
---
|
| 5 |
# Scene Segmenter for the Shared Task on Scene Segmentation
|
| 6 |
|
| 7 |
+
This is the scene segmenter model that is being used in [LLpro](https://github.com/cophi-wue/LLpro). On borders between sentences, it predicts one of the following labels:
|
| 8 |
+
- `B-Scene`: the preceding sentence began a new *Scene*.
|
| 9 |
+
- `B-Nonscene`: the preceding sentence began a new *Non-Scene*.
|
| 10 |
+
- `Scene`: the preceding sentence belongs to a *Scene*, but does not begin a new one – i.e., the scene continues.
|
| 11 |
+
- `Nonscene`: the preceding sentence belongs to a *Noncene*, but does not begin a new one – i.e., the non-scene continues.
|
| 12 |
|
| 13 |
+
Broadly speaking, the model is being used in a token classification setup. A sequence of multiple sentences is represented by interspersing the respective tokenizations with the special `[SEP]` token.
|
| 14 |
+
On these `[SEP]` tokens, the linear classification layer predicts one of the four above classes.
|
| 15 |
+
|
| 16 |
+
The model is trained on the dataset corresponding to the [KONVENS 2021 Shared Task on Scene Segmentation](http://lsx-events.informatik.uni-wuerzburg.de/stss-2021/task.html) [(Zehe et al., 2021)][http://ceur-ws.org/Vol-3001/#paper1] fine-tuning the domain-adapted [lkonle/fiction-gbert-large](https://huggingface.co/lkonle/fiction-gbert-large). ([Training code](https://github.com/cophi-wue/LLpro/blob/main/contrib/train_scene_segmenter.py))
|
| 17 |
+
|
| 18 |
+
F1-Score:
|
| 19 |
+
- **40.22** on Track 1 (in-domain dime novels)
|
| 20 |
+
- **35.09** on Track 2 (out-of-domain high brow novels)
|
| 21 |
+
|
| 22 |
+
The respective test datasets are only available to the task organizers; the task organizers evaluated this model on their private test set and report above scores. See the [KONVENS paper](http://ceur-ws.org/Vol-3001/#paper1) for a description of their metric.
|
| 23 |
+
|
| 24 |
+
---
|
| 25 |
+
|
| 26 |
+
**Demo Usage**:
|
| 27 |
+
|
| 28 |
+
```python
|
| 29 |
+
TODO
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
**Cite**:
|
| 34 |
+
|
| 35 |
+
Please cite the following paper when using this model.
|
| 36 |
+
|
| 37 |
+
```
|
| 38 |
+
@inproceedings{ehrmanntraut-et-al-llpro-2023,
|
| 39 |
+
location = {Ingolstadt, Germany},
|
| 40 |
+
title = {{LLpro}: A Literary Language Processing Pipeline for {German} Narrative Text},
|
| 41 |
+
booktitle = {Proceedings of the 10th Conference on Natural Language Processing ({KONVENS} 2022)},
|
| 42 |
+
publisher = {{KONVENS} 2023 Organizers},
|
| 43 |
+
author = {Ehrmanntraut, Anton and Konle, Leonard and Jannidis, Fotis},
|
| 44 |
+
date = {2023},
|
| 45 |
+
}
|
| 46 |
+
```
|