Update README.md
Browse files
README.md
CHANGED
|
@@ -210,16 +210,15 @@ We simply discard the system prompts.
|
|
| 210 |
|
| 211 |
**To put it all together, the text before tokenization looks like this:**
|
| 212 |
|
| 213 |
-
|
| 214 |
-
|
| 215 |
-
or
|
| 216 |
-
|
| 217 |
-
`instruction_augmented_text = "<|begin_of_text|>{instruction augmented text}<|end_of_text|>"`
|
| 218 |
|
|
|
|
|
|
|
| 219 |
Then, for tokenization, you don't need to add BOS and EOS token ids. The tokenization code looks like this:
|
| 220 |
-
|
| 221 |
-
|
| 222 |
-
|
| 223 |
|
| 224 |
## Citation
|
| 225 |
If you find our work helpful, please cite us:
|
|
|
|
| 210 |
|
| 211 |
**To put it all together, the text before tokenization looks like this:**
|
| 212 |
|
| 213 |
+
```python
|
| 214 |
+
general_instruction_response_text = "<|begin_of_text|>{question} {response}<|end_of_text|>"
|
|
|
|
|
|
|
|
|
|
| 215 |
|
| 216 |
+
instruction_augmented_text = "<|begin_of_text|>{instruction augmented text}<|end_of_text|>"
|
| 217 |
+
```
|
| 218 |
Then, for tokenization, you don't need to add BOS and EOS token ids. The tokenization code looks like this:
|
| 219 |
+
```python
|
| 220 |
+
text_ids = tokenizer(text, add_special_tokens=False, **kwargs).input_ids
|
| 221 |
+
```
|
| 222 |
|
| 223 |
## Citation
|
| 224 |
If you find our work helpful, please cite us:
|