SAP
/

miCSE

Sentence Similarity

feature-extraction

text-embeddings-inference

Model card Files Files and versions

TJKlein commited on Aug 16, 2023

Commit

14068ab

·

1 Parent(s): 9975451

Update README.md

Files changed (1) hide show

README.md +5 -0

README.md CHANGED Viewed

@@ -19,11 +19,16 @@ The model intended to be used for encoding sentences or short paragraphs. Given
 # Training data
 The model was trained on a random collection of **English** sentences from Wikipedia: [Training data file](https://huggingface.co/datasets/princeton-nlp/datasets-for-simcse/resolve/main/wiki1m_for_simcse.txt)
 # Model Training
 <mark>In order to make use of the **few-shot** capability of **miCSE**, the mode needs to be trained on your data. The source code and instructions to do so will be provided shortly. Stay tuned :). </mark>
 # Model Usage
 ### Example 1) - Sentence Similarity

 # Training data
 The model was trained on a random collection of **English** sentences from Wikipedia: [Training data file](https://huggingface.co/datasets/princeton-nlp/datasets-for-simcse/resolve/main/wiki1m_for_simcse.txt)
+Training data consists of data splits of different sizes (from 10% to 0.0064%) of the SimCSE training corpus. Each split size comprises 5 files, each created with a different seed.
+Data can be downloaded [here](https://huggingface.co/datasets/sap-ai-research/datasets-for-micse).
 # Model Training
 <mark>In order to make use of the **few-shot** capability of **miCSE**, the mode needs to be trained on your data. The source code and instructions to do so will be provided shortly. Stay tuned :). </mark>
+## Training Data
 # Model Usage
 ### Example 1) - Sentence Similarity