lovedheart
/

CoPaw-Flash-2B-FP8

@@ -5,37 +5,7 @@ language:
 - zh
 base_model:
 - Qwen/Qwen3.5-2B
-- agentscope-ai/CoPaw-Flash-2B
 ---
-# FP8 Quantized Version
-Quick launch via vLLM:
-```bash
-LLM_MEMORY_PROFILER_ESTIMATE_CUDAGRAPHS=1  \
-vllm serve  --model ~/CoPaw-Flash-2B-FP8/  \
-            --host 0.0.0.0    \
-            --port 8070 \
-            --tensor-parallel-size 1       \
-            --max-model-len 262144         \
-            --gpu-memory-utilization 0.92  \
-            --trust-remote-code    \
-            --tokenizer-mode auto  \
-            --served-model-name CoPaw-Flash-2B  \
-            --max-num-batched-tokens 4096   \
-            --max-num-seqs 1  \
-            --enable-auto-tool-choice  \
-            --tool-call-parser qwen3_coder  \
-            --kv-cache-dtype fp8_e4m3 \
-            --reasoning-parser qwen3 \
-            --enable-prefix-caching  \
-            --enable-chunked-prefill
-```
-## Expected speed (5060Ti)
-![NVIDIA_RTX_5060_TI,vllm,CoPaw-Flash-2B-FP8,20260404_215431](https://cdn-uploads.huggingface.co/production/uploads/68121d80da035a609e569a81/5hG8WD7NHM1f6RxWATZ3V.png)
 # CoPaw-Flash-2B
 **CoPaw-Flash** is a lightweight model deeply optimized for the CoPaw autonomous agent scenario. Since its training phase, the model has been specifically refined for CoPaw tasks, delivering enhanced agentic performance in tool invocation, command execution, memory management, and multi-step planning.
@@ -160,4 +130,4 @@ CoPaw-Flash is developed by the AgentScope Team. If you would like to leave us a
 | [Discord](https://discord.gg/eYMpfnkG8h)                     | [X (Twitter)](https://x.com/agentscope_ai)                   | [DingTalk](https://qr.dingtalk.com/action/joingroup?code=v1,k1,OmDlBXpjW+I2vWjKDsjvI9dhcXjGZi3bQiojOq3dlDw=&_dt_no_comment=1&origin=11) |
 | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
-| [<img src="https://gw.alicdn.com/imgextra/i1/O1CN01hhD1mu1Dd3BWVUvxN_!!6000000000238-2-tps-400-400.png" width="80" height="80" alt="Discord">](https://discord.gg/eYMpfnkG8h) | [<img src="https://img.alicdn.com/imgextra/i4/O1CN01c0GOsa1UTkoxAGVvZ_!!6000000002519-2-tps-225-225.png" width="80" height="80" alt="X">](https://x.com/agentscope_ai) | [<img src="https://img.alicdn.com/imgextra/i2/O1CN01vCWI8a1skHtLGXEMQ_!!6000000005804-2-tps-458-460.png" width="80" height="80" alt="DingTalk">](https://qr.dingtalk.com/action/joingroup?code=v1,k1,OmDlBXpjW+I2vWjKDsjvI9dhcXjGZi3bQiojOq3dlDw=&_dt_no_comment=1&origin=11) |

 - zh
 base_model:
 - Qwen/Qwen3.5-2B
 ---
 # CoPaw-Flash-2B
 **CoPaw-Flash** is a lightweight model deeply optimized for the CoPaw autonomous agent scenario. Since its training phase, the model has been specifically refined for CoPaw tasks, delivering enhanced agentic performance in tool invocation, command execution, memory management, and multi-step planning.
 | [Discord](https://discord.gg/eYMpfnkG8h)                     | [X (Twitter)](https://x.com/agentscope_ai)                   | [DingTalk](https://qr.dingtalk.com/action/joingroup?code=v1,k1,OmDlBXpjW+I2vWjKDsjvI9dhcXjGZi3bQiojOq3dlDw=&_dt_no_comment=1&origin=11) |
 | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
+| [<img src="https://gw.alicdn.com/imgextra/i1/O1CN01hhD1mu1Dd3BWVUvxN_!!6000000000238-2-tps-400-400.png" width="80" height="80" alt="Discord">](https://discord.gg/eYMpfnkG8h) | [<img src="https://img.alicdn.com/imgextra/i4/O1CN01c0GOsa1UTkoxAGVvZ_!!6000000002519-2-tps-225-225.png" width="80" height="80" alt="X">](https://x.com/agentscope_ai) | [<img src="https://img.alicdn.com/imgextra/i2/O1CN01vCWI8a1skHtLGXEMQ_!!6000000005804-2-tps-458-460.png" width="80" height="80" alt="DingTalk">](https://qr.dingtalk.com/action/joingroup?code=v1,k1,OmDlBXpjW+I2vWjKDsjvI9dhcXjGZi3bQiojOq3dlDw=&_dt_no_comment=1&origin=11) |

config.json CHANGED Viewed

@@ -13,7 +13,7 @@
         "attention_dropout": 0.0,
         "attn_output_gate": true,
         "bos_token_id": null,
-        "dtype": "float32",
         "eos_token_id": 248044,
         "full_attention_interval": 4,
         "head_dim": 256,

         "attention_dropout": 0.0,
         "attn_output_gate": true,
         "bos_token_id": null,
+        "dtype": "bfloat16",
         "eos_token_id": 248044,
         "full_attention_interval": 4,
         "head_dim": 256,