Sentence Similarity
Safetensors
roberta
zdanGL commited on
Commit
26f429f
·
verified ·
1 Parent(s): 176c68b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -1
README.md CHANGED
@@ -102,8 +102,19 @@ for i, definition in enumerate(definitions):
102
  ```
103
 
104
  ### Training Data
105
- - **Dataset:** 3 million pairs of Wikipedia anchor text links and Wikipedia page abstracts, derived from [this dataset](https://huggingface.co/datasets/wikimedia/structured-wikipedia)
106
  - **Special Token:** `[ENT]` token added to vocabulary mark entity mentions
 
 
 
 
 
 
 
 
 
 
 
107
 
108
  ### Training Details
109
  - **Hardware:** Single 80GB H100 GPU
 
102
  ```
103
 
104
  ### Training Data
105
+ - **Dataset:** 3 million pairs of Wikipedia anchor text links in context marked by the special [ENT] tokens, and Wikipedia page abstracts, derived from [this dataset](https://huggingface.co/datasets/wikimedia/structured-wikipedia)
106
  - **Special Token:** `[ENT]` token added to vocabulary mark entity mentions
107
+ - To illustrate the training data format, consider the following example:
108
+
109
+ * **Input (Context with Special Token):**
110
+ ```
111
+ is a commune in the Hérault department in the Occitanie [ENT] region [ENT] in
112
+ ```
113
+ * **Target (Abstract):**
114
+ ```
115
+ France is divided into eighteen administrative regions, of which thirteen are located in metropolitan France, while the other five are overseas regions...
116
+ ```
117
+
118
 
119
  ### Training Details
120
  - **Hardware:** Single 80GB H100 GPU