Is the checkpoint correct?

#2
by LoveOreo - opened

Hello great contributors! Thanks for your model!

Lately I was trying to deploy the K2.5 Eagle3 draft model on vLLM, however the accept rate is lower than your report β€” even lower than K2 Eagle3 in my experiments. I saw similar results on both vLLM and the SGLang PR, so it seems that people are having the same problems as me.

Are you sure the checkpoints are correct? Or could you share the steps to reproduce your results on either vLLM or SGLang?

Great thanks!

Kimi-K25 SGLang Benchmarking and Deployment

This repository (or document) outlines the setup and benchmarking details for deploying the Kimi-K25 model using sglang on H200 GPUs.

Overview

Our experiments and benchmarks for the Kimi-K25 model were conducted exclusively on H200 GPUs using sglang version 0.5.9. We did not perform any tests or comparisons with vLLM.

Deployment

To launch the sglang service for Kimi-K25 for experimentation and benchmarking, use the following command:

Launch Service Command:

python3 -m sglang.launch_server \
    --model-path /models/Kimi-K25 \
    --host 0.0.0.0 --port 30012 \
    --trust-remote-code \
    --mem-fraction-static 0.9 \
    --tp-size 8 \
    --speculative-algorithm EAGLE3 \
    --speculative-draft-model-path AQ-MedAI/Kimi-K25-eagle3 \
    --speculative-num-steps 3 \
    --speculative-eagle-topk 1 \
    --speculative-num-draft-tokens 4

**Benchmarking:**
For pressure testing and benchmarking the deployed sglang service, please refer to the SpecForge benchmarking suite:

Benchmarking Script Repository:
https://github.com/sgl-project/SpecForge/tree/main/benchmarks/benchmarker

Thanks for the prompt reply! Just to make sure, for GSM8K, the reported AVL is 2.746 with --speculative-num-steps 3. So each forward pass drafts 3 tokens, so on average 2.746-1=1.746 are accepted (since SGLang's accept len includes the bonus token)?

And I can reproduce your results, thanks!
image

Btw, the AVL for GSM8K with the K2 Eagle3 draft model is 3.165. What was the --speculative-num-steps setting for that? Presumably 3?

If I'am all correct, the accept rate for K2.5 is lower than K2?

En, K25-eagle3 version has been further trained on both Chinese and English datasets, while K2-eagle3 was trained on purely English data. Therefore, K25 might handle longer Chinese inputs better. However, K25-eagle3 will continue to be optimized and updated over time.

Thanks again, looking forward to it!

LoveOreo changed discussion status to closed

Sign up or log in to comment