tsor13
/

chat12b

Safetensors

gemma3

Model card Files Files and versions

xet

Community

tsor13 commited on Jul 14, 2025

Commit

1e3c7c6

verified ·

1 Parent(s): 2318493

Initial upload of fine‑tuned Gemma + custom tokenizer

Browse files

Files changed (1) hide show

README.md +53 -2

README.md CHANGED Viewed

@@ -1,10 +1,61 @@
-### tsor13/Special12b
 The following is a a model trained by [...suspense...] that is meant to:
 - follow instructions better than pretrained models and be more diverse / less mode-collapsed than instruct models;
 - be a really good, approximately bayesian in-context learner;
 - fit an data generation process
 - be calibrated over distributions of possible outputs wrt a population or epistemic uncertainty
-It is initialized from `google/gemma-3-12b-pt`.
 This model/repo is a work in progress - expect updates.

+#### tsor13/chat12b  ─ [`tsor13/chat12b`](https://huggingface.co/tsor13/chat12b)
 The following is a a model trained by [...suspense...] that is meant to:
 - follow instructions better than pretrained models and be more diverse / less mode-collapsed than instruct models;
 - be a really good, approximately bayesian in-context learner;
 - fit an data generation process
 - be calibrated over distributions of possible outputs wrt a population or epistemic uncertainty
+**Description:** From gemma‑3‑12b‑it; keeps full chat format.
+**Pros:** drop‑in for chat template · works on original logs
+**Cons:** many extra tokens
+<details><summary>Example w/ inputs</summary>
+```text
+<start_of_turn>user
+Generate something that fits this description. Don't generate anything else, just the desired generation output.
+Description: DESCRIPTION
+INPUT1<end_of_turn>
+<start_of_turn>model
+OUTPUT1<end_of_turn>
+<start_of_turn>user
+INPUT2<end_of_turn>
+<start_of_turn>model
+OUTPUT2<end_of_turn>
+```
+</details>
+<details><summary>Example w/o inputs</summary>
+```text
+<start_of_turn>user
+Generate something that fits this description. Don't generate anything else, just the desired generation output.
+Description: DESCRIPTION
+Generate.<end_of_turn>
+<start_of_turn>model
+OUTPUT1<end_of_turn>
+<start_of_turn>user
+Generate.<end_of_turn>
+<start_of_turn>model
+OUTPUT2<end_of_turn>
+```
+</details>
+There are three variants of the model for now:
+| **Field** | **special** | **extra** | **chat** |
+|-----------|-------------|-----------|----------|
+| **Model card** | [`tsor13/special12b`](https://huggingface.co/tsor13/special12b) | [`tsor13/extra12b`](https://huggingface.co/tsor13/extra12b) | [`tsor13/chat12b`](https://huggingface.co/tsor13/chat12b) |
+| **Description** | From `gemma-3-12b-pt`, but with chat‑token embeddings copied over | From `gemma-3-12b-pt`, but with chat‑token embeddings copied over | From `gemma-3-12b-it`, trained to preserve & assume chat format |
+| **Pros** | • Most token‑efficient (only tags around the output) | • Distinguishes description vs first input<br>• Closer to chat format<br>• Best generations (?) | • Drop‑in for Gemma‑chat template<br>• Works on original chat logs, even OOD |
+| **Cons** | • May not tell description from first input<br>• Formatting farther from Gemma chat template | • More tokens than *special* | • Many extra tokens |
+| **Example w/ inputs** | ```text\nDESCRIPTION\nINPUT1\n<start_of_turn>OUTPUT1<end_of_turn>\nINPUT2\n<start_of_turn>OUTPUT2<end_of_turn>``` | ```text\n<start_of_turn>description\nDESCRIPTION<end_of_turn>\n<start_of_turn>input\nINPUT1<end_of_turn>\n<start_of_turn>output\nOUTPUT1<end_of_turn>\n<start_of_turn>input\nINPUT2<end_of_turn>\n<start_of_turn>output\nOUTPUT2<end_of_turn>``` | ```text\n<start_of_turn>user\nGenerate …\nDescription: DESCRIPTION\n\nINPUT1<end_of_turn>\n<start_of_turn>model\nOUTPUT1<end_of_turn>\n<start_of_turn>user\nINPUT2<end_of_turn>\n<start_of_turn>model\nOUTPUT2<end_of_turn>``` |
+| **Example w/o inputs** | ```text\nDESCRIPTION\n<start_of_turn>OUTPUT1<end_of_turn>\n<start_of_turn>OUTPUT2<end_of_turn>``` | ```text\n<start_of_turn>description\nDESCRIPTION<end_of_turn>\n<start_of_turn>output\nOUTPUT1<end_of_turn>\n<start_of_turn>output\nOUTPUT2<end_of_turn>``` | ```text\n<start_of_turn>user\nGenerate …\nDescription: DESCRIPTION\n\nGenerate.<end_of_turn>\n<start_of_turn>model\nOUTPUT1<end_of_turn>\n<start_of_turn>user\nGenerate.<end_of_turn>\n<start_of_turn>model\nOUTPUT2<end_of_turn>``` |
 This model/repo is a work in progress - expect updates.