Is the checkpoint correct?

by LoveOreo - opened Mar 10

Mar 10

Hello great contributors! Thanks for your model!

Lately I was trying to deploy the K2.5 Eagle3 draft model on vLLM, however the accept rate is lower than your report — even lower than K2 Eagle3 in my experiments. I saw similar results on both vLLM and the SGLang PR, so it seems that people are having the same problems as me.

Are you sure the checkpoints are correct? Or could you share the steps to reproduce your results on either vLLM or SGLang?

Great thanks!

xiaomenshen

AQ org Mar 10

Kimi-K25 SGLang Benchmarking and Deployment

This repository (or document) outlines the setup and benchmarking details for deploying the Kimi-K25 model using sglang on H200 GPUs.

Overview

Our experiments and benchmarks for the Kimi-K25 model were conducted exclusively on H200 GPUs using sglang version 0.5.9. We did not perform any tests or comparisons with vLLM.

Deployment

To launch the sglang service for Kimi-K25 for experimentation and benchmarking, use the following command:

Launch Service Command:

python3 -m sglang.launch_server \
    --model-path /models/Kimi-K25 \
    --host 0.0.0.0 --port 30012 \
    --trust-remote-code \
    --mem-fraction-static 0.9 \
    --tp-size 8 \
    --speculative-algorithm EAGLE3 \
    --speculative-draft-model-path AQ-MedAI/Kimi-K25-eagle3 \
    --speculative-num-steps 3 \
    --speculative-eagle-topk 1 \
    --speculative-num-draft-tokens 4

**Benchmarking:**
For pressure testing and benchmarking the deployed sglang service, please refer to the SpecForge benchmarking suite:

Benchmarking Script Repository:
https://github.com/sgl-project/SpecForge/tree/main/benchmarks/benchmarker

LoveOreo

Mar 10

•

edited Mar 10

Thanks for the prompt reply! Just to make sure, for GSM8K, the reported AVL is 2.746 with --speculative-num-steps 3. So each forward pass drafts 3 tokens, so on average 2.746-1=1.746 are accepted (since SGLang's accept len includes the bonus token)?

And I can reproduce your results, thanks!

LoveOreo

Mar 10

•

edited Mar 10

Btw, the AVL for GSM8K with the K2 Eagle3 draft model is 3.165. What was the --speculative-num-steps setting for that? Presumably 3?

If I'am all correct, the accept rate for K2.5 is lower than K2?

xiaomenshen

AQ org Mar 11

En, K25-eagle3 version has been further trained on both Chinese and English datasets, while K2-eagle3 was trained on purely English data. Therefore, K25 might handle longer Chinese inputs better. However, K25-eagle3 will continue to be optimized and updated over time.

LoveOreo

Mar 12

Thanks again, looking forward to it!

LoveOreo changed discussion status to closed Mar 12

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment