dennlinger
/

bert-wiki-paragraphs

Text Classification

sentence-similarity

text-embeddings-inference

Model card Files Files and versions

dennlinger commited on Oct 13, 2022

Commit

26f1aa0

·

1 Parent(s): 5aab424

Update YAML and add link to dataset.

Files changed (1) hide show

README.md +14 -0

README.md CHANGED Viewed

@@ -1,9 +1,23 @@
 # BERT-Wiki-Paragraphs
 Authors: Satya Almasian\*, Dennis Aumiller\*, Lucienne-Sophie Marmé, Michael Gertz
 Contact us at `<lastname>@informatik.uni-heidelberg.de`
 Details for the training method can be found in our work [Structural Text Segmentation of Legal Documents](https://arxiv.org/abs/2012.03619).
 The training procedure follows the same setup, but we substitute legal documents for Wikipedia in this model.
 Training is performed in a form of weakly-supervised fashion to determine whether paragraphs topically belong together or not.
 We utilize automatically generated samples from Wikipedia for training, where paragraphs from within the same section are assumed to be topically coherent.

+---
+language:
+  - en
+tags:
+  - sentence-similarity
+  - text-classification
+datasets:
+  - wiki-paragraphs
+metrics:
+  - f1
+license: mit
+---
 # BERT-Wiki-Paragraphs
 Authors: Satya Almasian\*, Dennis Aumiller\*, Lucienne-Sophie Marmé, Michael Gertz
 Contact us at `<lastname>@informatik.uni-heidelberg.de`
 Details for the training method can be found in our work [Structural Text Segmentation of Legal Documents](https://arxiv.org/abs/2012.03619).
 The training procedure follows the same setup, but we substitute legal documents for Wikipedia in this model.
+Find the associated training data here: [wiki-paragraphs](https://huggingface.co/datasets/dennlinger/wiki-paragraphs)
 Training is performed in a form of weakly-supervised fashion to determine whether paragraphs topically belong together or not.
 We utilize automatically generated samples from Wikipedia for training, where paragraphs from within the same section are assumed to be topically coherent.