amd
/

DeepSeek-R1-MXFP4-ASQ

8-bit precision

Model card Files Files and versions

linzhao-amd commited on Oct 21, 2025

Commit

d809e62

·

verified ·

1 Parent(s): 23d2de6

Update README.md

Files changed (1) hide show

README.md +19 -1

README.md CHANGED Viewed

@@ -102,10 +102,19 @@ The model was evaluated on reasoning tasks including AIME24, MMLU_COT, and GSM8K
 ### Reproduction
-The results of AIME24 and MMLU_COT were obtained using [SGLang](https://docs.sglang.ai/) via [forked lm-evaluation-harness](https://github.com/BowenBao/lm-evaluation-harness/tree/cot)
 ### AIME24
 ```
 lm_eval --model local-completions \
     --model_args model=amd/DeepSeek-R1-MXFP4-ASQ,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=32000,temperature=0.6,top_p=0.95 \
     --tasks aime24 \
@@ -118,6 +127,15 @@ lm_eval --model local-completions \
 ### MMLU_COT
 ```
 lm_eval --model local-completions \
     --model_args model=amd/DeepSeek-R1-MXFP4-ASQ,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=32000,temperature=0.6,top_p=0.95 \
     --tasks mmlu_cot \

 ### Reproduction
+The results of AIME24 and MMLU_COT were obtained using [SGLang](https://docs.sglang.ai/) via forked [lm-evaluation-harness](https://github.com/BowenBao/lm-evaluation-harness/tree/cot)
 ### AIME24
 ```
+# Launching server
+python3 -m sglang.launch_server \
+    --model /data/DeepSeek-R1-WMXFP4-AMXFP4-Scale-UINT8-Attn-MoE-Quant/ \
+    --tp 8  \
+    --trust-remote-code  \
+    --n-share-experts-fusion 8 \
+    --disable-radix-cache
+# Evaluating
 lm_eval --model local-completions \
     --model_args model=amd/DeepSeek-R1-MXFP4-ASQ,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=32000,temperature=0.6,top_p=0.95 \
     --tasks aime24 \
 ### MMLU_COT
 ```
+# Launching server
+python3 -m sglang.launch_server \
+    --model amd/DeepSeek-R1-MXFP4-ASQ \
+    --tp 8 \
+    --trust-remote-code \
+    --chunked-prefill-size 32768 \
+    --mem-fraction-static 0.83
+# Evaluating
 lm_eval --model local-completions \
     --model_args model=amd/DeepSeek-R1-MXFP4-ASQ,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=32000,temperature=0.6,top_p=0.95 \
     --tasks mmlu_cot \