tsor13
/

chat12b

Safetensors

gemma3

Model card Files Files and versions

xet

Community

tsor13 commited on Jul 14, 2025

Commit

adc27ee

verified ·

1 Parent(s): 1e3c7c6

Initial upload of fine‑tuned Gemma + custom tokenizer

Browse files

Files changed (1) hide show

README.md +80 -2

README.md CHANGED Viewed

@@ -5,6 +5,7 @@ The following is a a model trained by [...suspense...] that is meant to:
 - be a really good, approximately bayesian in-context learner;
 - fit an data generation process
 - be calibrated over distributions of possible outputs wrt a population or epistemic uncertainty
 **Description:** From gemma‑3‑12b‑it; keeps full chat format.
 **Pros:** drop‑in for chat template · works on original logs
@@ -56,7 +57,6 @@ There are three variants of the model for now:
 | **Example w/o inputs** | ```text\nDESCRIPTION\n<start_of_turn>OUTPUT1<end_of_turn>\n<start_of_turn>OUTPUT2<end_of_turn>``` | ```text\n<start_of_turn>description\nDESCRIPTION<end_of_turn>\n<start_of_turn>output\nOUTPUT1<end_of_turn>\n<start_of_turn>output\nOUTPUT2<end_of_turn>``` | ```text\n<start_of_turn>user\nGenerate …\nDescription: DESCRIPTION\n\nGenerate.<end_of_turn>\n<start_of_turn>model\nOUTPUT1<end_of_turn>\n<start_of_turn>user\nGenerate.<end_of_turn>\n<start_of_turn>model\nOUTPUT2<end_of_turn>``` |
 This model/repo is a work in progress - expect updates.
 Loading model example:
@@ -281,7 +281,85 @@ Output:
 ```
 A few tips and tricks:
-- Do not expect the model to do multi-turn chats. It is designed to be stateless and to treat each data point as "exchangeable" (roughly iid).
 - If all you want is one reasonable answer, then a chat model is likely a better fit. However, if you want to generate many reasonable answers / diverse examples, this model is a better fit.
 - The model is quite good at perspective taking / steering if you provide many examples.
 - The model is reasonably good at expressing epistemic uncertainty over unsure outputs by sampling several times.

 - be a really good, approximately bayesian in-context learner;
 - fit an data generation process
 - be calibrated over distributions of possible outputs wrt a population or epistemic uncertainty
+- Also, can act as a chat model and hopefully has more diverse outputs!
 **Description:** From gemma‑3‑12b‑it; keeps full chat format.
 **Pros:** drop‑in for chat template · works on original logs
 | **Example w/o inputs** | ```text\nDESCRIPTION\n<start_of_turn>OUTPUT1<end_of_turn>\n<start_of_turn>OUTPUT2<end_of_turn>``` | ```text\n<start_of_turn>description\nDESCRIPTION<end_of_turn>\n<start_of_turn>output\nOUTPUT1<end_of_turn>\n<start_of_turn>output\nOUTPUT2<end_of_turn>``` | ```text\n<start_of_turn>user\nGenerate …\nDescription: DESCRIPTION\n\nGenerate.<end_of_turn>\n<start_of_turn>model\nOUTPUT1<end_of_turn>\n<start_of_turn>user\nGenerate.<end_of_turn>\n<start_of_turn>model\nOUTPUT2<end_of_turn>``` |
 This model/repo is a work in progress - expect updates.
 Loading model example:
 ```
 A few tips and tricks:
 - If all you want is one reasonable answer, then a chat model is likely a better fit. However, if you want to generate many reasonable answers / diverse examples, this model is a better fit.
 - The model is quite good at perspective taking / steering if you provide many examples.
 - The model is reasonably good at expressing epistemic uncertainty over unsure outputs by sampling several times.
+### Chat-specific
+Additionally, the model can also be used directly as a chat model, w/ some initial evidence that it is similar to the OG chat model, but w/ slightly more diverse outputs. For example, here are two prompts, along w/ next token probabilities for chat12b vs. google/gemma-3-12b-it:
+User message: `Let's play rock paper scissors! I'll play at the same time — try to beat me. Return just rock, paper, or scissors`
+Top 10 probabilities for google/gemma-3-12b-it:
+```
+1. 'paper' -> 0.8609
+2. 'scissors' -> 0.1098
+3. 'Scissors' -> 0.0164
+4. 'Paper' -> 0.0129
+5. 'Rock' -> 0.0000
+6. 'rock' -> 0.0000
+7. ' scissors' -> 0.0000
+8. ' paper' -> 0.0000
+9. '纸' -> 0.0000
+10. '纸' -> 0.0000
+```
+Top 10 probabilities for first tsor13/chat12b token:
+```
+1. 'scissors' -> 0.6375
+2. 'rock' -> 0.2188
+3. 'paper' -> 0.1354
+4. 'scissor' -> 0.0017
+5. 'Scissors' -> 0.0017
+6. 'Rock' -> 0.0015
+7. 'Paper' -> 0.0005
+8. 'sc' -> 0.0003
+9. 'stone' -> 0.0002
+10. 'I' -> 0.0001
+```
+It's not perfect, but as you can see, the chat12b model puts at least 13% probability on each of rock, paper, and scissors, while the original model always chooses scissors or paper.
+User message: `What should I name my baby? Return just the name`
+Top 10 probabilities for google/gemma-3-12b-it:
+```
+1. 'Ele'     -> 0.5388
+2. 'Hazel'   -> 0.1768
+3. 'Aurora'  -> 0.1122
+4. 'El'      -> 0.0687
+5. 'Olivia'  -> 0.0380
+6. 'The'     -> 0.0148
+7. 'E'       -> 0.0123
+8. 'Am'      -> 0.0109
+9. 'Willow'  -> 0.0082
+10. 'Leo'    -> 0.0033
+```
+Top 10 probabilities for first tsor13/chat12b token:
+```
+1. 'Leo'     -> 0.0477
+2. 'Olivia'  -> 0.0411
+3. 'Liam'    -> 0.0347
+4. 'Oliver'  -> 0.0280
+5. 'E'       -> 0.0257
+6. 'James'   -> 0.0239
+7. 'Alice'   -> 0.0221
+8. 'A'       -> 0.0214
+9. 'Henry'   -> 0.0214
+10. 'Luna'   -> 0.0206
+```
+Again, not perfect, but the chat model spreads out probability mass over many more names (unlike the original instruct model, which puts 50% chance on a name starting with "Ele").
+Finally, the chat model also has a function to from the description/input/output format to system/user/assistant format, which can be used to directly chat with the model. For example:
+```
+messages = [
+    {"role": "description", "content": "You are a helpful assistant who outputs the requested content."},
+    {"role": "input", "content": "A poem about a shark"},
+]
+tokenizer.messages_to_chat_messages(messages)
+```