Update README.md

Browse files

Files changed (1) hide show

README.md +14 -4

README.md CHANGED Viewed

@@ -24,17 +24,27 @@ Advanced, high-quality and lite reasoning for a tiny size that you can run local
 ![superthoughtslight.png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/2LuPB_ZPCGni3-PyCkL0-.png)
 we've continuously pre-trained SmolLM2-1.7B-Instruct on advanced reasoning patterns to create this model.
 # Examples:
 all responses below generated with no system prompt, 400 maximum tokens and a temperature of 0.7 (not recommended, 0.3 - 0.5 is better):
 Generated inside the android application, Pocketpal via GGUF Q8, using the model's prompt format.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/wh33o-vjxIePfPqoN3q1z.png)
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/7JeF3YNNhrlY2tED4rpFJ.png)
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/Y8optw73kTgqMnZKj3wKj.png)
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/6lywy3IYEIgzPnUIJ5RvF.png)
 # Uploaded  model

 ![superthoughtslight.png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/2LuPB_ZPCGni3-PyCkL0-.png)
 we've continuously pre-trained SmolLM2-1.7B-Instruct on advanced reasoning patterns to create this model.
+# Which quant is right for you?
+- ***Q4_k_m:*** This quant *can* be used on most devices, quality is acceptable but reasoning quality is low.
+- ***Q6_k:*** This quant is right in the middle, quality is better than q4_k_m but reasoning is still more limited than Q8.
+- ***Q8_0:*** **RECOMMENDED** This quant yields very high quality results, good reasoning, good answers at a fast speed, on a Snapdragon 8 Gen 2 with 16 GB's of ram, it runs on 13 tokens per minute on average, see examples below.
+- ***F16:*** Maximum quality GGUF quant, not needed for most tasks, results very similar to Q8_0.
+# Evaluation (soon)
 # Examples:
 all responses below generated with no system prompt, 400 maximum tokens and a temperature of 0.7 (not recommended, 0.3 - 0.5 is better):
 Generated inside the android application, Pocketpal via GGUF Q8, using the model's prompt format.
+1)
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/wh33o-vjxIePfPqoN3q1z.png)
+2)
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/7JeF3YNNhrlY2tED4rpFJ.png)
+3)
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/Y8optw73kTgqMnZKj3wKj.png)
+4)
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/6lywy3IYEIgzPnUIJ5RvF.png)
+5)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/0K2rR9osmT20JrDvZuptV.png)
 # Uploaded  model