cookprotocol commited on
Commit
997d739
·
verified ·
1 Parent(s): 11873f4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +175 -3
README.md CHANGED
@@ -1,3 +1,175 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Model Card: COOK Protocol - Chef_0.1.1
2
+
3
+ 1. Introduction
4
+
5
+ We present Chef_0.1.1, a groundbreaking AI model within the COOK Protocol ecosystem, designed to empower builders and power-users on Hyperliquid. Chef_0.1.1 incorporates a Mixture-of-Experts (MoE) architecture, featuring 671B total parameters with 37B activated per token. To ensure cost-efficient training and scalable inference, Chef_0.1.1 employs Multi-head Latent Attention (MLA) and ChefMoE architectures, refined from previous iterations. The model introduces an auxiliary-loss-free strategy for load balancing and adopts a multi-token prediction training objective for enhanced performance.
6
+
7
+ Chef_0.1.1 was pre-trained on 14.8 trillion diverse, high-quality tokens and fine-tuned using supervised learning and reinforcement learning to unlock its full potential. Benchmark evaluations demonstrate that Chef_0.1.1 surpasses other open-source models and rivals leading closed-source alternatives. Notably, the training process required only 2.788M H800 GPU hours, showcasing exceptional efficiency and stability. No irrecoverable loss spikes or rollbacks occurred throughout training.
8
+
9
+ 2. Model Summary
10
+
11
+ Architecture: Load Balancing and Training Innovation
12
+
13
+ Building upon the foundations of the COOK Protocol, Chef_0.1.1 pioneers several advancements:
14
+
15
+ Auxiliary-Loss-Free Strategy: Mitigates performance degradation from load balancing requirements.
16
+
17
+ Multi-Token Prediction (MTP): Enhances model performance and accelerates inference with speculative decoding.
18
+
19
+ Pre-Training: Advanced Efficiency
20
+
21
+ Chef_0.1.1 leverages:
22
+
23
+ FP8 Mixed Precision Training: Demonstrated feasibility and efficiency at scale.
24
+
25
+ Algorithm-Hardware Co-Design: Overcomes communication bottlenecks in cross-node MoE training, achieving near-complete computation-communication overlap.
26
+
27
+ Economic Pre-Training: At 2.664M GPU hours, Chef_0.1.1 completes pre-training on 14.8 trillion tokens as a robust open-source model, with subsequent training requiring only 0.1M GPU hours.
28
+
29
+ Post-Training: Knowledge Integration
30
+
31
+ Chef_0.1.1 incorporates reasoning capabilities via an innovative pipeline that integrates Chain-of-Thought (CoT) verification and reflection patterns. This methodology significantly improves reasoning and enables output customization for COOK Protocol applications.
32
+
33
+ 3. Model Downloads
34
+
35
+ Model
36
+
37
+ Total Params
38
+
39
+ Activated Params
40
+
41
+ Context Length
42
+
43
+ Download
44
+
45
+ Chef_0.1.1-Base
46
+
47
+ 671B
48
+
49
+ 37B
50
+
51
+ 128K
52
+
53
+ 🤗 HuggingFace
54
+
55
+ Chef_0.1.1
56
+
57
+ 671B
58
+
59
+ 37B
60
+
61
+ 128K
62
+
63
+ 🤗 HuggingFace
64
+
65
+ Notes:
66
+
67
+ The total size of Chef_0.1.1 models is 685B, encompassing 671B main model weights and 14B for the Multi-Token Prediction (MTP) module. The community actively develops MTP functionality, and contributions are welcome.
68
+
69
+ 4. Evaluation Results
70
+
71
+ Base Model Benchmarks
72
+
73
+ Chef_0.1.1 excels in various benchmarks, including:
74
+
75
+ Benchmark (Metric)
76
+
77
+ Shots
78
+
79
+ COOK Protocol V2
80
+
81
+ LLaMA 3.1 405B
82
+
83
+ Chef_0.1.1
84
+
85
+ English Pile-test
86
+
87
+ -
88
+
89
+ 0.606
90
+
91
+ 0.542
92
+
93
+ 0.548
94
+
95
+ MMLU (Accuracy)
96
+
97
+ 5-shot
98
+
99
+ 78.4
100
+
101
+ 84.4
102
+
103
+ 87.1
104
+
105
+ DROP (F1)
106
+
107
+ 3-shot
108
+
109
+ 80.4
110
+
111
+ 86.0
112
+
113
+ 89.0
114
+
115
+ Code HumanEval
116
+
117
+ 0-shot
118
+
119
+ 43.3
120
+
121
+ 54.9
122
+
123
+ 65.2
124
+
125
+ Math MATH (EM)
126
+
127
+ 4-shot
128
+
129
+ 43.4
130
+
131
+ 49.0
132
+
133
+ 61.6
134
+
135
+ For a full list of evaluation metrics, refer to our documentation on Hugging Face.
136
+
137
+ 5. Chat Website & API Platform
138
+
139
+ Interact with Chef_0.1.1 directly:
140
+
141
+ Chat: Visit the COOK Protocol chat interface: chat.cookprotocol.ai
142
+
143
+ API Access: OpenAI-compatible API available on the COOK Platform: platform.cookprotocol.ai
144
+
145
+ 6. How to Run Locally
146
+
147
+ Chef_0.1.1 supports various hardware configurations for seamless deployment. Key tools and methods include:
148
+
149
+ Recommended Frameworks
150
+
151
+ COOK-Infer Demo: Lightweight FP8 and BF16 inference.
152
+
153
+ SGLang: Optimized latency and throughput, supporting FP8 and BF16 precision.
154
+
155
+ LMDeploy: High-performance offline and online inference.
156
+
157
+ Quick Start Example
158
+
159
+ Clone the Chef_0.1.1 GitHub repository:
160
+
161
+ git clone https://github.com/cook-protocol/Chef_0.1.1.git
162
+
163
+ Navigate to the inference folder and install dependencies:
164
+
165
+ cd Chef_0.1.1/inference
166
+ pip install -r requirements.txt
167
+
168
+ Run interactive inference:
169
+
170
+ torchrun --nnodes 2 --nproc-per-node 8 generate.py --node-rank $RANK --master-addr $ADDR --ckpt-path /path/to/Chef_0.1.1 --config configs/config_671B.json --interactive --temperature 0.7 --max-new-tokens 200
171
+
172
+ 7. License
173
+
174
+ Chef_0.1.1 is released under the apache-2.0, with commercial use permitted. For more details, refer to the COOK Protocol Model License.
175
+ ---