danielcherubini
/

Qwen3.5-DeltaCoder-9B

@@ -47,6 +47,32 @@ All major weight matrices are adapted:
 Final training loss: ~0.94 (average: 1.268), decreasing steadily over training.
 ## Usage
 ### With PEFT

 Final training loss: ~0.94 (average: 1.268), decreasing steadily over training.
+## Recommended Sampling Settings
+These settings were validated through testing with [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp) and [Kronk](https://github.com/danielcherubini/kronk) on an RTX 3080 10GB.
+| Profile | temperature | top_k | top_p | min_p | presence_penalty |
+|---------|-------------|-------|-------|-------|-----------------|
+| **Coding** | 0.6 | 20 | 0.95 | 0.0 | 0.0 |
+| **Chat** | 1.0 | 20 | 0.95 | 0.0 | 1.5 |
+> [!WARNING]
+> **Do not use temperature below 0.5** — low temperatures (e.g., 0.3) cause deterministic looping in multi-turn agentic use, where the model repeats the same tool call indefinitely.
+### KV Cache Quantization
+For VRAM-constrained GPUs, use quantized KV cache keys/values:
+| Context Length | KV Cache | VRAM (Q4_K_M) | Generation Speed |
+|---------------|----------|---------------|-----------------|
+| 102,400 | f16/q4_0 | ~8.5 GB | ~111 tok/s |
+| 131,072 | f16/q4_0 | ~9.1 GB | ~110 tok/s |
+```bash
+# llama.cpp / ik_llama.cpp flags
+-ctk f16 -ctv q4_0
+```
 ## Usage
 ### With PEFT