tsor13 commited on
Commit
f338434
·
verified ·
1 Parent(s): 65fe647

Initial upload of fine‑tuned Gemma + custom tokenizer

Browse files
Files changed (1) hide show
  1. README.md +18 -8
README.md CHANGED
@@ -35,14 +35,14 @@ print(formatted_prompt) # start_generation adds the <start_of_turn> token to con
35
  ```
36
  Output:
37
  ```
38
- This is a test task
39
- What is 2+2?
40
- <start_of_turn>4<end_of_turn>
41
- What is 3+3?
42
  <start_of_turn>
43
  ```
44
  The data for the model to emulate / generate is wrapped in `<start_of_turn>` / `<end_of_turn>` tokens.
45
- Description and input is not wrapped in anything. Thus, do not expect the m
46
  Messages are separated by newlines.
47
 
48
  In training, loss is ONLY calculated on the output tokens and the `<end_of_turn>` token. Thus, the model is only designed to generate / predict probabilities after `<start_of_turn>` and until `<end_of_turn>` - everything else is out of distribution for the model and not recommended.
@@ -119,7 +119,7 @@ You can also specify just the description:
119
  Input:
120
  ```
121
  messages = [
122
- {"role": "description", "content": "Metaphors about life."},
123
  ]
124
 
125
  formatted_prompt = tokenizer.messages_to_text(messages, start_generation=True)
@@ -131,6 +131,13 @@ for i in range(n_gens):
131
  print(tokenizer.decode(outputs[i][inputs["input_ids"][i].shape[0]:], skip_special_tokens=True))
132
  print()
133
  ```
 
 
 
 
 
 
 
134
 
135
  Finally, let's look at a synthetic data generation task. For example, maybe we want to generate situations to do social reasoning over, along with whether or not they are awkward. When there are multiple variables to condition on or generat, the model is used to json format.
136
 
@@ -172,5 +179,8 @@ Output:
172
  {"situation": "Your friend comes over and asks you to help them move some furniture, but you have other plans.", "is_awkward": true}
173
  ```
174
 
175
-
176
-
 
 
 
 
35
  ```
36
  Output:
37
  ```
38
+ Capitals
39
+ France
40
+ <start_of_turn>Paris<end_of_turn>
41
+ Japan
42
  <start_of_turn>
43
  ```
44
  The data for the model to emulate / generate is wrapped in `<start_of_turn>` / `<end_of_turn>` tokens.
45
+ Description and input is not wrapped in anything. Thus, do not expect the model to generate these tokens - instead focus on the wrapped output tokens.
46
  Messages are separated by newlines.
47
 
48
  In training, loss is ONLY calculated on the output tokens and the `<end_of_turn>` token. Thus, the model is only designed to generate / predict probabilities after `<start_of_turn>` and until `<end_of_turn>` - everything else is out of distribution for the model and not recommended.
 
119
  Input:
120
  ```
121
  messages = [
122
+ {"role": "description", "content": "Descriptive colors"},
123
  ]
124
 
125
  formatted_prompt = tokenizer.messages_to_text(messages, start_generation=True)
 
131
  print(tokenizer.decode(outputs[i][inputs["input_ids"][i].shape[0]:], skip_special_tokens=True))
132
  print()
133
  ```
134
+ Output:
135
+ ```
136
+ Navy Blue
137
+ White
138
+ light green
139
+ yellow
140
+ ```
141
 
142
  Finally, let's look at a synthetic data generation task. For example, maybe we want to generate situations to do social reasoning over, along with whether or not they are awkward. When there are multiple variables to condition on or generat, the model is used to json format.
143
 
 
179
  {"situation": "Your friend comes over and asks you to help them move some furniture, but you have other plans.", "is_awkward": true}
180
  ```
181
 
182
+ A few tips and tricks:
183
+ - Do not expect the model to do multi-turn chats. It is designed to be stateless and to treat each data point as "exchangeable" (roughly iid).
184
+ - If all you want is one reasonable answer, then a chat model is likely a better fit. However, if you want to generate many reasonable answers / diverse examples, this model is a better fit.
185
+ - The model is quite good at perspective taking / steering if you provide many examples.
186
+ - The model is reasonably good at expressing epistemic uncertainty over unsure outputs by sampling several times.