Update README.md
Browse files
README.md
CHANGED
|
@@ -24,17 +24,27 @@ Advanced, high-quality and lite reasoning for a tiny size that you can run local
|
|
| 24 |

|
| 25 |
we've continuously pre-trained SmolLM2-1.7B-Instruct on advanced reasoning patterns to create this model.
|
| 26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
# Examples:
|
| 28 |
all responses below generated with no system prompt, 400 maximum tokens and a temperature of 0.7 (not recommended, 0.3 - 0.5 is better):
|
| 29 |
Generated inside the android application, Pocketpal via GGUF Q8, using the model's prompt format.
|
| 30 |
-
|
| 31 |

|
| 32 |
-
|
| 33 |

|
| 34 |
-
|
| 35 |

|
| 36 |
-
|
| 37 |

|
|
|
|
|
|
|
| 38 |
|
| 39 |
# Uploaded model
|
| 40 |
|
|
|
|
| 24 |

|
| 25 |
we've continuously pre-trained SmolLM2-1.7B-Instruct on advanced reasoning patterns to create this model.
|
| 26 |
|
| 27 |
+
# Which quant is right for you?
|
| 28 |
+
- ***Q4_k_m:*** This quant *can* be used on most devices, quality is acceptable but reasoning quality is low.
|
| 29 |
+
- ***Q6_k:*** This quant is right in the middle, quality is better than q4_k_m but reasoning is still more limited than Q8.
|
| 30 |
+
- ***Q8_0:*** **RECOMMENDED** This quant yields very high quality results, good reasoning, good answers at a fast speed, on a Snapdragon 8 Gen 2 with 16 GB's of ram, it runs on 13 tokens per minute on average, see examples below.
|
| 31 |
+
- ***F16:*** Maximum quality GGUF quant, not needed for most tasks, results very similar to Q8_0.
|
| 32 |
+
|
| 33 |
+
# Evaluation (soon)
|
| 34 |
+
|
| 35 |
# Examples:
|
| 36 |
all responses below generated with no system prompt, 400 maximum tokens and a temperature of 0.7 (not recommended, 0.3 - 0.5 is better):
|
| 37 |
Generated inside the android application, Pocketpal via GGUF Q8, using the model's prompt format.
|
| 38 |
+
1)
|
| 39 |

|
| 40 |
+
2)
|
| 41 |

|
| 42 |
+
3)
|
| 43 |

|
| 44 |
+
4)
|
| 45 |

|
| 46 |
+
5)
|
| 47 |
+

|
| 48 |
|
| 49 |
# Uploaded model
|
| 50 |
|