Commit ·
26f1aa0
1
Parent(s): 5aab424
Update YAML and add link to dataset.
Browse files
README.md
CHANGED
|
@@ -1,9 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# BERT-Wiki-Paragraphs
|
| 2 |
|
| 3 |
Authors: Satya Almasian\*, Dennis Aumiller\*, Lucienne-Sophie Marmé, Michael Gertz
|
| 4 |
Contact us at `<lastname>@informatik.uni-heidelberg.de`
|
| 5 |
Details for the training method can be found in our work [Structural Text Segmentation of Legal Documents](https://arxiv.org/abs/2012.03619).
|
| 6 |
The training procedure follows the same setup, but we substitute legal documents for Wikipedia in this model.
|
|
|
|
| 7 |
|
| 8 |
Training is performed in a form of weakly-supervised fashion to determine whether paragraphs topically belong together or not.
|
| 9 |
We utilize automatically generated samples from Wikipedia for training, where paragraphs from within the same section are assumed to be topically coherent.
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
tags:
|
| 5 |
+
- sentence-similarity
|
| 6 |
+
- text-classification
|
| 7 |
+
datasets:
|
| 8 |
+
- wiki-paragraphs
|
| 9 |
+
metrics:
|
| 10 |
+
- f1
|
| 11 |
+
license: mit
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
# BERT-Wiki-Paragraphs
|
| 15 |
|
| 16 |
Authors: Satya Almasian\*, Dennis Aumiller\*, Lucienne-Sophie Marmé, Michael Gertz
|
| 17 |
Contact us at `<lastname>@informatik.uni-heidelberg.de`
|
| 18 |
Details for the training method can be found in our work [Structural Text Segmentation of Legal Documents](https://arxiv.org/abs/2012.03619).
|
| 19 |
The training procedure follows the same setup, but we substitute legal documents for Wikipedia in this model.
|
| 20 |
+
Find the associated training data here: [wiki-paragraphs](https://huggingface.co/datasets/dennlinger/wiki-paragraphs)
|
| 21 |
|
| 22 |
Training is performed in a form of weakly-supervised fashion to determine whether paragraphs topically belong together or not.
|
| 23 |
We utilize automatically generated samples from Wikipedia for training, where paragraphs from within the same section are assumed to be topically coherent.
|