Pinkstack commited on
Commit
212aed2
·
verified ·
1 Parent(s): 382cfc7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -4
README.md CHANGED
@@ -24,17 +24,27 @@ Advanced, high-quality and lite reasoning for a tiny size that you can run local
24
  ![superthoughtslight.png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/2LuPB_ZPCGni3-PyCkL0-.png)
25
  we've continuously pre-trained SmolLM2-1.7B-Instruct on advanced reasoning patterns to create this model.
26
 
 
 
 
 
 
 
 
 
27
  # Examples:
28
  all responses below generated with no system prompt, 400 maximum tokens and a temperature of 0.7 (not recommended, 0.3 - 0.5 is better):
29
  Generated inside the android application, Pocketpal via GGUF Q8, using the model's prompt format.
30
-
31
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/wh33o-vjxIePfPqoN3q1z.png)
32
-
33
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/7JeF3YNNhrlY2tED4rpFJ.png)
34
-
35
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/Y8optw73kTgqMnZKj3wKj.png)
36
-
37
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/6lywy3IYEIgzPnUIJ5RvF.png)
 
 
38
 
39
  # Uploaded model
40
 
 
24
  ![superthoughtslight.png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/2LuPB_ZPCGni3-PyCkL0-.png)
25
  we've continuously pre-trained SmolLM2-1.7B-Instruct on advanced reasoning patterns to create this model.
26
 
27
+ # Which quant is right for you?
28
+ - ***Q4_k_m:*** This quant *can* be used on most devices, quality is acceptable but reasoning quality is low.
29
+ - ***Q6_k:*** This quant is right in the middle, quality is better than q4_k_m but reasoning is still more limited than Q8.
30
+ - ***Q8_0:*** **RECOMMENDED** This quant yields very high quality results, good reasoning, good answers at a fast speed, on a Snapdragon 8 Gen 2 with 16 GB's of ram, it runs on 13 tokens per minute on average, see examples below.
31
+ - ***F16:*** Maximum quality GGUF quant, not needed for most tasks, results very similar to Q8_0.
32
+
33
+ # Evaluation (soon)
34
+
35
  # Examples:
36
  all responses below generated with no system prompt, 400 maximum tokens and a temperature of 0.7 (not recommended, 0.3 - 0.5 is better):
37
  Generated inside the android application, Pocketpal via GGUF Q8, using the model's prompt format.
38
+ 1)
39
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/wh33o-vjxIePfPqoN3q1z.png)
40
+ 2)
41
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/7JeF3YNNhrlY2tED4rpFJ.png)
42
+ 3)
43
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/Y8optw73kTgqMnZKj3wKj.png)
44
+ 4)
45
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/6lywy3IYEIgzPnUIJ5RvF.png)
46
+ 5)
47
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/0K2rR9osmT20JrDvZuptV.png)
48
 
49
  # Uploaded model
50