tsor13
/

extra12b

tsor13 commited on Jul 14, 2025

Commit

bdad2f2

verified ·

1 Parent(s): 880fe8b

Initial upload of fine‑tuned Gemma + custom tokenizer

Files changed (1) hide show

README.md CHANGED Viewed

@@ -50,7 +50,8 @@ There are three variants of the model for now:
 | **Example w/o inputs** | ```text\nDESCRIPTION\n<start_of_turn>OUTPUT1<end_of_turn>\n<start_of_turn>OUTPUT2<end_of_turn>``` | ```text\n<start_of_turn>description\nDESCRIPTION<end_of_turn>\n<start_of_turn>output\nOUTPUT1<end_of_turn>\n<start_of_turn>output\nOUTPUT2<end_of_turn>``` | ```text\n<start_of_turn>user\nGenerate …\nDescription: DESCRIPTION\n\nGenerate.<end_of_turn>\n<start_of_turn>model\nOUTPUT1<end_of_turn>\n<start_of_turn>user\nGenerate.<end_of_turn>\n<start_of_turn>model\nOUTPUT2<end_of_turn>``` |
 At the moment, I recommend:
-- [special](https://huggingface.co/tsor13/special12b) and [extra](https://huggingface.co/tsor13/extra12b) for most use cases, and are roughly interchangeable.
 - [chat](https://huggingface.co/tsor13/chat12b) is a good fit for chat-style data or conversations.

 | **Example w/o inputs** | ```text\nDESCRIPTION\n<start_of_turn>OUTPUT1<end_of_turn>\n<start_of_turn>OUTPUT2<end_of_turn>``` | ```text\n<start_of_turn>description\nDESCRIPTION<end_of_turn>\n<start_of_turn>output\nOUTPUT1<end_of_turn>\n<start_of_turn>output\nOUTPUT2<end_of_turn>``` | ```text\n<start_of_turn>user\nGenerate …\nDescription: DESCRIPTION\n\nGenerate.<end_of_turn>\n<start_of_turn>model\nOUTPUT1<end_of_turn>\n<start_of_turn>user\nGenerate.<end_of_turn>\n<start_of_turn>model\nOUTPUT2<end_of_turn>``` |
 At the moment, I recommend:
+- [special](https://huggingface.co/tsor13/special12b) for most use cases (token-efficient and gets best loss on training data)
+- [extra](https://huggingface.co/tsor13/extra12b) for when generation quality is more important than token efficiency
 - [chat](https://huggingface.co/tsor13/chat12b) is a good fit for chat-style data or conversations.