N8Programs
/

talkie-box

Text Generation

Model card Files Files and versions

N8Programs commited on 19 days ago

Commit

fd82ed5

·

verified ·

1 Parent(s): 09effbe

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ pipeline_tag: text-generation
 A lightly post-trained version of [talkie-lm/talkie-1930-13b-base](https://huggingface.co/talkie-lm/talkie-1930-13b-base) via some initial SFT on an elicited persona followed by KTO on Claude-preferred responses. Distinct from the official instruction tune in that it is instructed to play the character of an intelligent machine, tuned to have slightly more modern-day preferences (so it may adopt the views of a 1930s progressive), and finally differs in its chat template, which forgoes XML to instead present itself as a transcript/play.
-Recommended sampling settings are `temp=0.5, min_p=0.05, top_k=40, repetition_penalty=1.2, repetition_context_size=128`. Like the base model, it has a max context size of 2048.
 ## Chat template

 A lightly post-trained version of [talkie-lm/talkie-1930-13b-base](https://huggingface.co/talkie-lm/talkie-1930-13b-base) via some initial SFT on an elicited persona followed by KTO on Claude-preferred responses. Distinct from the official instruction tune in that it is instructed to play the character of an intelligent machine, tuned to have slightly more modern-day preferences (so it may adopt the views of a 1930s progressive), and finally differs in its chat template, which forgoes XML to instead present itself as a transcript/play.
+Recommended sampling settings are `temp=0.5, min_p=0.05, top_k=40, repetition_penalty=1.2, repetition_context_size=128`. Like the base model, it has a max context size of 2048. It additionally retians the (limited) few shot learning ability of the base model - going from 7.73% GSM8K at 1-shot to 11.30% at 2-shot to 12.36% at 4-shot.
 ## Chat template