tsor13 commited on
Commit
0f224f5
·
verified ·
1 Parent(s): e3d0986

Initial upload of fine‑tuned Gemma + custom tokenizer

Browse files
Files changed (1) hide show
  1. README.md +46 -2
README.md CHANGED
@@ -1,4 +1,4 @@
1
- ### tsor13/Special12b
2
  The following is a a model trained by [...suspense...] that is meant to:
3
  - follow instructions better than pretrained models and be more diverse / less mode-collapsed than instruct models;
4
  - be a really good, approximately bayesian in-context learner;
@@ -6,6 +6,50 @@ The following is a a model trained by [...suspense...] that is meant to:
6
  - be calibrated over distributions of possible outputs wrt a population or epistemic uncertainty
7
  It is initialized from `google/gemma-3-12b-pt`.
8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  This model/repo is a work in progress - expect updates.
10
 
11
  Loading model example:
@@ -37,7 +81,7 @@ print(formatted_prompt) # start_generation adds the <start_of_turn> token to con
37
  ```
38
  Output:
39
  ```
40
- <start_of_turn>descriptions
41
  Capitals<end_of_turn>
42
  <start_of_turn>input
43
  France<end_of_turn>
 
1
+ #### tsor13/extra12b ─ [`tsor13/extra12b`](https://huggingface.co/tsor13/extra12b)
2
  The following is a a model trained by [...suspense...] that is meant to:
3
  - follow instructions better than pretrained models and be more diverse / less mode-collapsed than instruct models;
4
  - be a really good, approximately bayesian in-context learner;
 
6
  - be calibrated over distributions of possible outputs wrt a population or epistemic uncertainty
7
  It is initialized from `google/gemma-3-12b-pt`.
8
 
9
+ **Description:** From gemma‑3‑12b‑pt with chat token embeddings.
10
+ **Pros:** distinguishes description/input · closer to chat · best generations(?)
11
+ **Cons:** more tokens than *special*
12
+
13
+ <details><summary>Example w/ inputs</summary>
14
+
15
+ ```text
16
+ <start_of_turn>description
17
+ DESCRIPTION<end_of_turn>
18
+ <start_of_turn>input
19
+ INPUT1<end_of_turn>
20
+ <start_of_turn>output
21
+ OUTPUT1<end_of_turn>
22
+ <start_of_turn>input
23
+ INPUT2<end_of_turn>
24
+ <start_of_turn>output
25
+ OUTPUT2<end_of_turn>
26
+ ```
27
+ </details>
28
+
29
+ <details><summary>Example w/o inputs</summary>
30
+
31
+ ```text
32
+ <start_of_turn>description
33
+ DESCRIPTION<end_of_turn>
34
+ <start_of_turn>output
35
+ OUTPUT1<end_of_turn>
36
+ <start_of_turn>output
37
+ OUTPUT2<end_of_turn>
38
+ ```
39
+ </details>
40
+
41
+
42
+ There are three variants of the model for now:
43
+ | **Field** | **special** | **extra** | **chat** |
44
+ |-----------|-------------|-----------|----------|
45
+ | **Model card** | [`tsor13/special12b`](https://huggingface.co/tsor13/special12b) | [`tsor13/extra12b`](https://huggingface.co/tsor13/extra12b) | [`tsor13/chat12b`](https://huggingface.co/tsor13/chat12b) |
46
+ | **Description** | From `gemma-3-12b-pt`, but with chat‑token embeddings copied over | From `gemma-3-12b-pt`, but with chat‑token embeddings copied over | From `gemma-3-12b-it`, trained to preserve & assume chat format |
47
+ | **Pros** | • Most token‑efficient (only tags around the output) | • Distinguishes description vs first input<br>• Closer to chat format<br>• Best generations (?) | • Drop‑in for Gemma‑chat template<br>• Works on original chat logs, even OOD |
48
+ | **Cons** | • May not tell description from first input<br>• Formatting farther from Gemma chat template | • More tokens than *special* | • Many extra tokens |
49
+ | **Example w/ inputs** | ```text\nDESCRIPTION\nINPUT1\n<start_of_turn>OUTPUT1<end_of_turn>\nINPUT2\n<start_of_turn>OUTPUT2<end_of_turn>``` | ```text\n<start_of_turn>description\nDESCRIPTION<end_of_turn>\n<start_of_turn>input\nINPUT1<end_of_turn>\n<start_of_turn>output\nOUTPUT1<end_of_turn>\n<start_of_turn>input\nINPUT2<end_of_turn>\n<start_of_turn>output\nOUTPUT2<end_of_turn>``` | ```text\n<start_of_turn>user\nGenerate …\nDescription: DESCRIPTION\n\nINPUT1<end_of_turn>\n<start_of_turn>model\nOUTPUT1<end_of_turn>\n<start_of_turn>user\nINPUT2<end_of_turn>\n<start_of_turn>model\nOUTPUT2<end_of_turn>``` |
50
+ | **Example w/o inputs** | ```text\nDESCRIPTION\n<start_of_turn>OUTPUT1<end_of_turn>\n<start_of_turn>OUTPUT2<end_of_turn>``` | ```text\n<start_of_turn>description\nDESCRIPTION<end_of_turn>\n<start_of_turn>output\nOUTPUT1<end_of_turn>\n<start_of_turn>output\nOUTPUT2<end_of_turn>``` | ```text\n<start_of_turn>user\nGenerate …\nDescription: DESCRIPTION\n\nGenerate.<end_of_turn>\n<start_of_turn>model\nOUTPUT1<end_of_turn>\n<start_of_turn>user\nGenerate.<end_of_turn>\n<start_of_turn>model\nOUTPUT2<end_of_turn>``` |
51
+
52
+
53
  This model/repo is a work in progress - expect updates.
54
 
55
  Loading model example:
 
81
  ```
82
  Output:
83
  ```
84
+ <start_of_turn>description
85
  Capitals<end_of_turn>
86
  <start_of_turn>input
87
  France<end_of_turn>