Initial upload of fine‑tuned Gemma + custom tokenizer
Browse files
README.md
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
|
| 2 |
The following is a a model trained by [...suspense...] that is meant to:
|
| 3 |
- follow instructions better than pretrained models and be more diverse / less mode-collapsed than instruct models;
|
| 4 |
- be a really good, approximately bayesian in-context learner;
|
|
@@ -6,6 +6,50 @@ The following is a a model trained by [...suspense...] that is meant to:
|
|
| 6 |
- be calibrated over distributions of possible outputs wrt a population or epistemic uncertainty
|
| 7 |
It is initialized from `google/gemma-3-12b-pt`.
|
| 8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
This model/repo is a work in progress - expect updates.
|
| 10 |
|
| 11 |
Loading model example:
|
|
@@ -37,7 +81,7 @@ print(formatted_prompt) # start_generation adds the <start_of_turn> token to con
|
|
| 37 |
```
|
| 38 |
Output:
|
| 39 |
```
|
| 40 |
-
<start_of_turn>
|
| 41 |
Capitals<end_of_turn>
|
| 42 |
<start_of_turn>input
|
| 43 |
France<end_of_turn>
|
|
|
|
| 1 |
+
#### tsor13/extra12b ─ [`tsor13/extra12b`](https://huggingface.co/tsor13/extra12b)
|
| 2 |
The following is a a model trained by [...suspense...] that is meant to:
|
| 3 |
- follow instructions better than pretrained models and be more diverse / less mode-collapsed than instruct models;
|
| 4 |
- be a really good, approximately bayesian in-context learner;
|
|
|
|
| 6 |
- be calibrated over distributions of possible outputs wrt a population or epistemic uncertainty
|
| 7 |
It is initialized from `google/gemma-3-12b-pt`.
|
| 8 |
|
| 9 |
+
**Description:** From gemma‑3‑12b‑pt with chat token embeddings.
|
| 10 |
+
**Pros:** distinguishes description/input · closer to chat · best generations(?)
|
| 11 |
+
**Cons:** more tokens than *special*
|
| 12 |
+
|
| 13 |
+
<details><summary>Example w/ inputs</summary>
|
| 14 |
+
|
| 15 |
+
```text
|
| 16 |
+
<start_of_turn>description
|
| 17 |
+
DESCRIPTION<end_of_turn>
|
| 18 |
+
<start_of_turn>input
|
| 19 |
+
INPUT1<end_of_turn>
|
| 20 |
+
<start_of_turn>output
|
| 21 |
+
OUTPUT1<end_of_turn>
|
| 22 |
+
<start_of_turn>input
|
| 23 |
+
INPUT2<end_of_turn>
|
| 24 |
+
<start_of_turn>output
|
| 25 |
+
OUTPUT2<end_of_turn>
|
| 26 |
+
```
|
| 27 |
+
</details>
|
| 28 |
+
|
| 29 |
+
<details><summary>Example w/o inputs</summary>
|
| 30 |
+
|
| 31 |
+
```text
|
| 32 |
+
<start_of_turn>description
|
| 33 |
+
DESCRIPTION<end_of_turn>
|
| 34 |
+
<start_of_turn>output
|
| 35 |
+
OUTPUT1<end_of_turn>
|
| 36 |
+
<start_of_turn>output
|
| 37 |
+
OUTPUT2<end_of_turn>
|
| 38 |
+
```
|
| 39 |
+
</details>
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
There are three variants of the model for now:
|
| 43 |
+
| **Field** | **special** | **extra** | **chat** |
|
| 44 |
+
|-----------|-------------|-----------|----------|
|
| 45 |
+
| **Model card** | [`tsor13/special12b`](https://huggingface.co/tsor13/special12b) | [`tsor13/extra12b`](https://huggingface.co/tsor13/extra12b) | [`tsor13/chat12b`](https://huggingface.co/tsor13/chat12b) |
|
| 46 |
+
| **Description** | From `gemma-3-12b-pt`, but with chat‑token embeddings copied over | From `gemma-3-12b-pt`, but with chat‑token embeddings copied over | From `gemma-3-12b-it`, trained to preserve & assume chat format |
|
| 47 |
+
| **Pros** | • Most token‑efficient (only tags around the output) | • Distinguishes description vs first input<br>• Closer to chat format<br>• Best generations (?) | • Drop‑in for Gemma‑chat template<br>• Works on original chat logs, even OOD |
|
| 48 |
+
| **Cons** | • May not tell description from first input<br>• Formatting farther from Gemma chat template | • More tokens than *special* | • Many extra tokens |
|
| 49 |
+
| **Example w/ inputs** | ```text\nDESCRIPTION\nINPUT1\n<start_of_turn>OUTPUT1<end_of_turn>\nINPUT2\n<start_of_turn>OUTPUT2<end_of_turn>``` | ```text\n<start_of_turn>description\nDESCRIPTION<end_of_turn>\n<start_of_turn>input\nINPUT1<end_of_turn>\n<start_of_turn>output\nOUTPUT1<end_of_turn>\n<start_of_turn>input\nINPUT2<end_of_turn>\n<start_of_turn>output\nOUTPUT2<end_of_turn>``` | ```text\n<start_of_turn>user\nGenerate …\nDescription: DESCRIPTION\n\nINPUT1<end_of_turn>\n<start_of_turn>model\nOUTPUT1<end_of_turn>\n<start_of_turn>user\nINPUT2<end_of_turn>\n<start_of_turn>model\nOUTPUT2<end_of_turn>``` |
|
| 50 |
+
| **Example w/o inputs** | ```text\nDESCRIPTION\n<start_of_turn>OUTPUT1<end_of_turn>\n<start_of_turn>OUTPUT2<end_of_turn>``` | ```text\n<start_of_turn>description\nDESCRIPTION<end_of_turn>\n<start_of_turn>output\nOUTPUT1<end_of_turn>\n<start_of_turn>output\nOUTPUT2<end_of_turn>``` | ```text\n<start_of_turn>user\nGenerate …\nDescription: DESCRIPTION\n\nGenerate.<end_of_turn>\n<start_of_turn>model\nOUTPUT1<end_of_turn>\n<start_of_turn>user\nGenerate.<end_of_turn>\n<start_of_turn>model\nOUTPUT2<end_of_turn>``` |
|
| 51 |
+
|
| 52 |
+
|
| 53 |
This model/repo is a work in progress - expect updates.
|
| 54 |
|
| 55 |
Loading model example:
|
|
|
|
| 81 |
```
|
| 82 |
Output:
|
| 83 |
```
|
| 84 |
+
<start_of_turn>description
|
| 85 |
Capitals<end_of_turn>
|
| 86 |
<start_of_turn>input
|
| 87 |
France<end_of_turn>
|