Update README.md
Browse files
README.md
CHANGED
|
@@ -9,7 +9,7 @@ tags:
|
|
| 9 |
- MSMARCO
|
| 10 |
---
|
| 11 |
# Description
|
| 12 |
-
We use MS Marco Encoder msmarco-MiniLM-L-6-v3 to encode the text from dataset [abokbot/wikipedia-first-paragraph](https://huggingface.co/datasets/abokbot/wikipedia-first-paragraph).
|
| 13 |
|
| 14 |
The dataset contains the first paragraphs of the English "20220301.en" version of the [Wikipedia dataset](https://huggingface.co/datasets/wikipedia).
|
| 15 |
|
|
@@ -28,4 +28,7 @@ bi_encoder.max_seq_length = 256
|
|
| 28 |
wikipedia_embedding = bi_encoder.encode(dataset["text"], convert_to_tensor=True, show_progress_bar=True)
|
| 29 |
|
| 30 |
```
|
| 31 |
-
This operation took 35min on a Google Colab notebook with GPU.
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
- MSMARCO
|
| 10 |
---
|
| 11 |
# Description
|
| 12 |
+
We use MS Marco Encoder msmarco-MiniLM-L-6-v3 from the sentence-transformers library to encode the text from dataset [abokbot/wikipedia-first-paragraph](https://huggingface.co/datasets/abokbot/wikipedia-first-paragraph).
|
| 13 |
|
| 14 |
The dataset contains the first paragraphs of the English "20220301.en" version of the [Wikipedia dataset](https://huggingface.co/datasets/wikipedia).
|
| 15 |
|
|
|
|
| 28 |
wikipedia_embedding = bi_encoder.encode(dataset["text"], convert_to_tensor=True, show_progress_bar=True)
|
| 29 |
|
| 30 |
```
|
| 31 |
+
This operation took 35min on a Google Colab notebook with GPU.
|
| 32 |
+
|
| 33 |
+
# Reference
|
| 34 |
+
More information of MS Marco encoders here https://www.sbert.net/docs/pretrained-models/ce-msmarco.html
|