Update README.md
Browse files
README.md
CHANGED
|
@@ -38,3 +38,13 @@ This causes Cholesky decomposition to fail during quantization, as the Hessian m
|
|
| 38 |
|
| 39 |
Note that inf/NaN values are present in the **original model** during inference as well — both the quantized and original models produce NaN perplexity. This appears to be caused by numerically unstable expert weights that produce overflow during forward pass, not by the quantizer itself. The same layer (`blk.61.ffn_down_exps`) [has been identified](https://www.reddit.com/r/LocalLLaMA/comments/1slk4di/) as causing NaN perplexity across GGUF quantizations by multiple providers.
|
| 40 |
</details>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
Note that inf/NaN values are present in the **original model** during inference as well — both the quantized and original models produce NaN perplexity. This appears to be caused by numerically unstable expert weights that produce overflow during forward pass, not by the quantizer itself. The same layer (`blk.61.ffn_down_exps`) [has been identified](https://www.reddit.com/r/LocalLLaMA/comments/1slk4di/) as causing NaN perplexity across GGUF quantizations by multiple providers.
|
| 40 |
</details>
|
| 41 |
+
|
| 42 |
+
### Setup tool calling for TabbyAPI
|
| 43 |
+
|
| 44 |
+
Add `tool_format: minimax_m2` to your `config.yml` or per-model `tabby_config.yml`. Also enable `reasoning: true` to properly separate thinking blocks from output:
|
| 45 |
+
|
| 46 |
+
```yaml
|
| 47 |
+
model:
|
| 48 |
+
tool_format: minimax_m2
|
| 49 |
+
reasoning: true
|
| 50 |
+
```
|