Kwai-Keye commited on
Commit
385e90c
·
verified ·
1 Parent(s): fa4356e

Add files using upload-large-folder tool

Browse files
Files changed (1) hide show
  1. README.md +168 -0
README.md ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Keye-VL-671B-A37B
2
+
3
+
4
+ Meet Keye-VL-671B-A37B — the most powerful multi-modal language model in the Keye series to date.
5
+
6
+ As one of the largest and most capable MLLMs currently in existence, Keye-VL 671B demonstrates achieved top-tier and in some cases even leading performance in text understanding and generation, complex visual perception and reasoning, comprehensive video understanding, and Olympic-level mathematical reasoning.
7
+
8
+ #### Key Enhancements:
9
+
10
+ ##### Pre-Training
11
+ * **Efficient Perception Building with Limited Compute**: We employ VisionEncoder from Keye-VL-1.5 and rigorously processed high-quality data to cost-effectively build the model’s core perceptual capabilities, ensuring strong visual understanding without excessive computational overhead.
12
+
13
+ * **Multi-Modal Data Curation**: We implement a automated data pipeline that performs strict filtering, re-sampling, and large-scale synthesis of structured VQA data, including OCR, charts, and tables. This end-to-end process significantly enhances the model’s perception quality and generalization.
14
+
15
+ * **Reasoning Sustainment via Synthetic CoT Data**: During the continual pretrain phase, we incorporate a diverse set of synthetically generated chain-of-thought (CoT) data. This ensures the model maintains its complex reasoning skills while progressing in perceptual pre-training.
16
+
17
+ ##### Post-Training
18
+
19
+ * **Scaling law of Reasoning Data for SFT**: We experimentally validate that the mixed data (50B Instruct & Long-CoT data) improves model performance and training stability compared to the single model (30B Instruct data).
20
+ * **CoT Quality & Style Refinement**: We develop a data filtering process to remove redundant reflective chains, improving the model's reasoning and perception capabilities, with the in-house process outperforming GPT-4o.
21
+ * **High-Precision RL Verifier**: We train a dedicated verifier (Keye Verifier) to validate the model's reasoning consistency and answer correctness, achieving significantly higher accuracy than other reward models and general LLMs, thereby enhancing our RL performance gains.
22
+
23
+ ## Model Performance
24
+
25
+ ![Performance Comparison](figures/radar.png)
26
+
27
+ | | Benchmarks | Seed1.5-VL thinking | dots.vlm1 | Qwen3-VL-235B-A22B thinking | Keye-VL-1.5-671B-A37B |
28
+ | --------------------- | -------------- | :-----------------: | :-------: | :-------------------------: | :-------------------: |
29
+ | STEM/Reasoning | MMMU_VAL | 77.9 | 80.11 | 80.6 | **83.78** |
30
+ | | MMMU_Pro | 67.6 | 70.11 | 69.3 | **72.49** |
31
+ | | MathVision | 68.7 | 69.64 | **74.6** | 69.11 |
32
+ | | MathVista | 85.6 | 85.0 | 85.8 | **86.2** |
33
+ | | OlympiadBench | 65.0 | - | - | **74.92** |
34
+ | | VisuLogic | 35.0 | 32.2 | 34.4 | **35.4** |
35
+ | General VQA | RealWorldQA | 78.4 | 79.08 | 81.3 | **86.54** |
36
+ | | MMStar | 77.8 | 76.67 | 78.7 | **86.67** |
37
+ | | MMBench-en | 89.9 | 89.32 | 90.6 | **95.74** |
38
+ | | MMbench-cn | 89.1 | 88.24 | - | **94.27** |
39
+ | | MMVP | 69.3 | 72.0 | - | **88.0** |
40
+ | | V* | 89.0 | - | - | **90.05** |
41
+ | | HallusionBench | 60.3 | 64.83 | 66.7 | **72.3** |
42
+ | Video | VideoMME | 77.9 | - | **79.0** | **79.0** |
43
+ | | LongVideoBench | 74.0 | - | 65.2 (fp8) | **79.0** |
44
+ | | MMVU | 70.1 | - | 78.4 (fp8) | **86.6** |
45
+ | | TempCompass | **83.7** | - | 81.03 (fp8) | 77.75 |
46
+ | Text Recog./Doc/chart | TextVQA | **81.8** | - | - | 76.21 |
47
+ | | DocVQA_VAL | **96.9** | 96.52 | 96.5 | 95.39 |
48
+ | | ChartQA_TEST | **89.1** | 87.68 | - | 86.68 |
49
+ | | InfoVQA | **91.2** | - | 89.5 | 86.93 |
50
+ | | CharXiv (RQ) | 60.2 | 64.4 | 66.1 | **79.4** |
51
+ | | CharXiv (DQ) | 92.6 | 92.1 | - | **94.5** |
52
+ | | AI2D_TEST | 87.3 | 88.37 | 89.2 | **91.19** |
53
+ | Pure Text | AIME2025 | - | 85.83 | **89.7** | 83.3 |
54
+ | | GPQA | - | **72.78** | - | 71.21 |
55
+
56
+ ## Quickstart
57
+
58
+ ### Environment Setup
59
+
60
+ ```shell
61
+ docker run -it --gpus all lmsysorg/sglang:v0.5.2
62
+ # make sure each node use the following commands to install the custom SGLang branch
63
+ git clone -b keye-dpsk-infer-fp8-release https://github.com/Kwai-Keye/sglang.git sglang
64
+ pip install -e sglang/python[all]
65
+ ```
66
+
67
+ ### Two-Node H800x8 Deployment
68
+
69
+ #### Prerequisites:
70
+
71
+ - Model: Kwai-Keye/Keye-VL-671B-A37B
72
+ - Node 1 IP: 192.168.1.100 (MASTER_NODE_IP)
73
+ - Node 2 IP: 192.168.1.101 (WORKER_NODE_IP)
74
+
75
+ #### Node 1 (Master - rank 0):
76
+
77
+ ```shell
78
+ MODEL_PATH=/path/to/Keye-VL-671B-A37B
79
+ DIST_INIT_ADDR="MASTER_NODE_IP:29500" # e.g. 192.168.1.100:29500
80
+ PORT=30000 # listening port on each node
81
+ python3 -m sglang.launch_server \
82
+ --model-path $MODEL_PATH \
83
+ --host 0.0.0.0 \
84
+ --port $PORT \
85
+ --tp-size 16 \
86
+ --nnodes 2 \
87
+ --node-rank 0 \
88
+ --dist-init-addr $DIST_INIT_ADDR \
89
+ --trust-remote-code \
90
+ --mm-attention-backend fa3 \
91
+ --attention-backend fa3 \
92
+ --disable-radix-cache \
93
+ --mem-fraction-static 0.8 \
94
+ --cuda-graph-max-bs 64 \
95
+ --model-loader-extra-config '{"enable_multithread_load": true, "num_threads": 32}'
96
+ ```
97
+
98
+ #### Node 2 (Worker - rank 1):
99
+
100
+ ```shell
101
+ MODEL_PATH=/path/to/Keye-VL-671B-A37B
102
+ DIST_INIT_ADDR="MASTER_NODE_IP:29500" # e.g. 192.168.1.100:29500
103
+ PORT=30000 # listening port on each node
104
+ python3 -m sglang.launch_server \
105
+ --model-path $MODEL_PATH \
106
+ --host 0.0.0.0 \
107
+ --port $PORT \
108
+ --tp-size 16 \
109
+ --nnodes 2 \
110
+ --node-rank 1 \
111
+ --dist-init-addr $DIST_INIT_ADDR \
112
+ --trust-remote-code \
113
+ --mm-attention-backend fa3 \
114
+ --attention-backend fa3 \
115
+ --disable-radix-cache \
116
+ --mem-fraction-static 0.8 \
117
+ --cuda-graph-max-bs 64 \
118
+ --model-loader-extra-config '{"enable_multithread_load": true, "num_threads": 32}'
119
+ ```
120
+
121
+ For more deployment details, please refer to the [Keye-VL-671B-A37B Deployment Tutorial](https://github.com/Kwai-Keye/sglang/blob/keye-dpsk-infer-fp8-release/scripts/deploy_keye_deepseek/DEPLOY_TUTORIAL.md).
122
+
123
+ ### Client Usage
124
+
125
+ ```python
126
+ import json
127
+ import requests
128
+
129
+ BASE_URL = "http://MASTER_NODE_IP:30000"
130
+
131
+ def generate(messages):
132
+ payload = {
133
+ "model": "",
134
+ "messages": messages,
135
+ "n": 1,
136
+ "temperature": 0.0,
137
+ "max_tokens": 256,
138
+ "top_k": 1,
139
+ "ignore_eos": False,
140
+ "skip_special_tokens": True,
141
+ }
142
+ resp = requests.post(
143
+ f"{BASE_URL}/v1/chat/completions",
144
+ headers={"Content-Type": "application/json"},
145
+ data=json.dumps(payload),
146
+ timeout=1800,
147
+ )
148
+ resp.raise_for_status()
149
+ return resp.json()
150
+
151
+ # Example: image + text
152
+ messages = [
153
+ {
154
+ "role": "user",
155
+ "content": [
156
+ {
157
+ "type": "image_url",
158
+ "image_url": {"url": "https://raw.githubusercontent.com/sgl-project/sglang/main/assets/logo.png"},
159
+ },
160
+ {"type": "text", "text": "Describe this image in detail."},
161
+ ],
162
+ }
163
+ ]
164
+
165
+ result = generate(messages)
166
+ print(result["choices"][0]["message"]["content"])
167
+ ```
168
+