Update README
Browse files
README.md
CHANGED
|
@@ -118,7 +118,7 @@ modelscope download --model cialtion/SimpleTool \
|
|
| 118 |
| RT-Qwen2.5-14B-AWQ | 14B | ~130ms | [🤗](https://huggingface.co/Cialtion/SimpleTool/tree/main/RT-Qwen2.5-14B-AWQ) | [Link](https://www.modelscope.cn/models/cialtion/SimpleTool/tree/master/RT-Qwen2.5-14B-AWQ) |
|
| 119 |
| RT-Qwen3-30B-A3B-AWQ | 30B(A3B) | ~ | [🤗](https://huggingface.co/Cialtion/SimpleTool/tree/main/RT-Qwen3-30B_awq_w4a16) | [Link](https://www.modelscope.cn/models/cialtion/SimpleTool/tree/master/RT-Qwen3-30B_awq_w4a16) |
|
| 120 |
|
| 121 |
-
> Latency measured on RTX 4090 with vLLM prefix caching. v2 models use an improved
|
| 122 |
|
| 123 |
</details>
|
| 124 |
|
|
|
|
| 118 |
| RT-Qwen2.5-14B-AWQ | 14B | ~130ms | [🤗](https://huggingface.co/Cialtion/SimpleTool/tree/main/RT-Qwen2.5-14B-AWQ) | [Link](https://www.modelscope.cn/models/cialtion/SimpleTool/tree/master/RT-Qwen2.5-14B-AWQ) |
|
| 119 |
| RT-Qwen3-30B-A3B-AWQ | 30B(A3B) | ~ | [🤗](https://huggingface.co/Cialtion/SimpleTool/tree/main/RT-Qwen3-30B_awq_w4a16) | [Link](https://www.modelscope.cn/models/cialtion/SimpleTool/tree/master/RT-Qwen3-30B_awq_w4a16) |
|
| 120 |
|
| 121 |
+
> Latency measured on RTX 4090 with vLLM prefix caching. v2 models use an improved and clearer prompt format; v1 models use a former multi-head instruction header. You can also download fp16 models in huggingface or modelscope.
|
| 122 |
|
| 123 |
</details>
|
| 124 |
|