mistral
ConicCat commited on
Commit
04de1ba
·
verified ·
1 Parent(s): 8db8b64

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -9,7 +9,7 @@ datasets:
9
 
10
  Day 2 RP finetune of Apriel 15B, with several iterative improvements from the first version. In particular, coherence at good temperatures (~.7) should be much higher.
11
 
12
- Compared to the first version I merged 20% of the instruct checkpoint back in to mitigate forgetting and to preserve more of the base model's style.
13
 
14
  I also fully converted the model to use the Phi 3 format; this comes at the slight tradeoff of the `<|end|>` tag not always tokenizing exactly the same way in a few niche scenarios.
15
 
@@ -31,6 +31,8 @@ Similar to the Qwen 3 model line, Apriel R1P can be used with or without thinkin
31
 
32
  The chat template has been converted to a Phi 3 template as the model seemed to respond best to this format.
33
 
 
 
34
  ## Special Thanks:
35
 
36
  Undi95 for portions of their dataset and inspiration.
 
9
 
10
  Day 2 RP finetune of Apriel 15B, with several iterative improvements from the first version. In particular, coherence at good temperatures (~.7) should be much higher.
11
 
12
+ Compared to the first version I merged 20% of the original instruct checkpoint back in to mitigate forgetting and to preserve more of the original model's style.
13
 
14
  I also fully converted the model to use the Phi 3 format; this comes at the slight tradeoff of the `<|end|>` tag not always tokenizing exactly the same way in a few niche scenarios.
15
 
 
31
 
32
  The chat template has been converted to a Phi 3 template as the model seemed to respond best to this format.
33
 
34
+ This model does prefer having character cards placed in user messages, not the system prompt.
35
+
36
  ## Special Thanks:
37
 
38
  Undi95 for portions of their dataset and inspiration.