Initial upload of fine‑tuned Gemma + custom tokenizer
Browse files
README.md
CHANGED
|
@@ -35,14 +35,14 @@ print(formatted_prompt) # start_generation adds the <start_of_turn> token to con
|
|
| 35 |
```
|
| 36 |
Output:
|
| 37 |
```
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
<start_of_turn>
|
| 41 |
-
|
| 42 |
<start_of_turn>
|
| 43 |
```
|
| 44 |
The data for the model to emulate / generate is wrapped in `<start_of_turn>` / `<end_of_turn>` tokens.
|
| 45 |
-
Description and input is not wrapped in anything. Thus, do not expect the
|
| 46 |
Messages are separated by newlines.
|
| 47 |
|
| 48 |
In training, loss is ONLY calculated on the output tokens and the `<end_of_turn>` token. Thus, the model is only designed to generate / predict probabilities after `<start_of_turn>` and until `<end_of_turn>` - everything else is out of distribution for the model and not recommended.
|
|
@@ -119,7 +119,7 @@ You can also specify just the description:
|
|
| 119 |
Input:
|
| 120 |
```
|
| 121 |
messages = [
|
| 122 |
-
{"role": "description", "content": "
|
| 123 |
]
|
| 124 |
|
| 125 |
formatted_prompt = tokenizer.messages_to_text(messages, start_generation=True)
|
|
@@ -131,6 +131,13 @@ for i in range(n_gens):
|
|
| 131 |
print(tokenizer.decode(outputs[i][inputs["input_ids"][i].shape[0]:], skip_special_tokens=True))
|
| 132 |
print()
|
| 133 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 134 |
|
| 135 |
Finally, let's look at a synthetic data generation task. For example, maybe we want to generate situations to do social reasoning over, along with whether or not they are awkward. When there are multiple variables to condition on or generat, the model is used to json format.
|
| 136 |
|
|
@@ -172,5 +179,8 @@ Output:
|
|
| 172 |
{"situation": "Your friend comes over and asks you to help them move some furniture, but you have other plans.", "is_awkward": true}
|
| 173 |
```
|
| 174 |
|
| 175 |
-
|
| 176 |
-
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
```
|
| 36 |
Output:
|
| 37 |
```
|
| 38 |
+
Capitals
|
| 39 |
+
France
|
| 40 |
+
<start_of_turn>Paris<end_of_turn>
|
| 41 |
+
Japan
|
| 42 |
<start_of_turn>
|
| 43 |
```
|
| 44 |
The data for the model to emulate / generate is wrapped in `<start_of_turn>` / `<end_of_turn>` tokens.
|
| 45 |
+
Description and input is not wrapped in anything. Thus, do not expect the model to generate these tokens - instead focus on the wrapped output tokens.
|
| 46 |
Messages are separated by newlines.
|
| 47 |
|
| 48 |
In training, loss is ONLY calculated on the output tokens and the `<end_of_turn>` token. Thus, the model is only designed to generate / predict probabilities after `<start_of_turn>` and until `<end_of_turn>` - everything else is out of distribution for the model and not recommended.
|
|
|
|
| 119 |
Input:
|
| 120 |
```
|
| 121 |
messages = [
|
| 122 |
+
{"role": "description", "content": "Descriptive colors"},
|
| 123 |
]
|
| 124 |
|
| 125 |
formatted_prompt = tokenizer.messages_to_text(messages, start_generation=True)
|
|
|
|
| 131 |
print(tokenizer.decode(outputs[i][inputs["input_ids"][i].shape[0]:], skip_special_tokens=True))
|
| 132 |
print()
|
| 133 |
```
|
| 134 |
+
Output:
|
| 135 |
+
```
|
| 136 |
+
Navy Blue
|
| 137 |
+
White
|
| 138 |
+
light green
|
| 139 |
+
yellow
|
| 140 |
+
```
|
| 141 |
|
| 142 |
Finally, let's look at a synthetic data generation task. For example, maybe we want to generate situations to do social reasoning over, along with whether or not they are awkward. When there are multiple variables to condition on or generat, the model is used to json format.
|
| 143 |
|
|
|
|
| 179 |
{"situation": "Your friend comes over and asks you to help them move some furniture, but you have other plans.", "is_awkward": true}
|
| 180 |
```
|
| 181 |
|
| 182 |
+
A few tips and tricks:
|
| 183 |
+
- Do not expect the model to do multi-turn chats. It is designed to be stateless and to treat each data point as "exchangeable" (roughly iid).
|
| 184 |
+
- If all you want is one reasonable answer, then a chat model is likely a better fit. However, if you want to generate many reasonable answers / diverse examples, this model is a better fit.
|
| 185 |
+
- The model is quite good at perspective taking / steering if you provide many examples.
|
| 186 |
+
- The model is reasonably good at expressing epistemic uncertainty over unsure outputs by sampling several times.
|