cookprotocol
/

souschef

Safetensors

Model card Files Files and versions

xet

Community

cookprotocol commited on Dec 28, 2024

Commit

997d739

verified ·

1 Parent(s): 11873f4

Update README.md

Browse files

Files changed (1) hide show

README.md +175 -3

README.md CHANGED Viewed

@@ -1,3 +1,175 @@
----
-license: apache-2.0
----

+Model Card: COOK Protocol - Chef_0.1.1
+1. Introduction
+We present Chef_0.1.1, a groundbreaking AI model within the COOK Protocol ecosystem, designed to empower builders and power-users on Hyperliquid. Chef_0.1.1 incorporates a Mixture-of-Experts (MoE) architecture, featuring 671B total parameters with 37B activated per token. To ensure cost-efficient training and scalable inference, Chef_0.1.1 employs Multi-head Latent Attention (MLA) and ChefMoE architectures, refined from previous iterations. The model introduces an auxiliary-loss-free strategy for load balancing and adopts a multi-token prediction training objective for enhanced performance.
+Chef_0.1.1 was pre-trained on 14.8 trillion diverse, high-quality tokens and fine-tuned using supervised learning and reinforcement learning to unlock its full potential. Benchmark evaluations demonstrate that Chef_0.1.1 surpasses other open-source models and rivals leading closed-source alternatives. Notably, the training process required only 2.788M H800 GPU hours, showcasing exceptional efficiency and stability. No irrecoverable loss spikes or rollbacks occurred throughout training.
+2. Model Summary
+Architecture: Load Balancing and Training Innovation
+Building upon the foundations of the COOK Protocol, Chef_0.1.1 pioneers several advancements:
+Auxiliary-Loss-Free Strategy: Mitigates performance degradation from load balancing requirements.
+Multi-Token Prediction (MTP): Enhances model performance and accelerates inference with speculative decoding.
+Pre-Training: Advanced Efficiency
+Chef_0.1.1 leverages:
+FP8 Mixed Precision Training: Demonstrated feasibility and efficiency at scale.
+Algorithm-Hardware Co-Design: Overcomes communication bottlenecks in cross-node MoE training, achieving near-complete computation-communication overlap.
+Economic Pre-Training: At 2.664M GPU hours, Chef_0.1.1 completes pre-training on 14.8 trillion tokens as a robust open-source model, with subsequent training requiring only 0.1M GPU hours.
+Post-Training: Knowledge Integration
+Chef_0.1.1 incorporates reasoning capabilities via an innovative pipeline that integrates Chain-of-Thought (CoT) verification and reflection patterns. This methodology significantly improves reasoning and enables output customization for COOK Protocol applications.
+3. Model Downloads
+Model
+Total Params
+Activated Params
+Context Length
+Download
+Chef_0.1.1-Base
+671B
+37B
+128K
+🤗 HuggingFace
+Chef_0.1.1
+671B
+37B
+128K
+🤗 HuggingFace
+Notes:
+The total size of Chef_0.1.1 models is 685B, encompassing 671B main model weights and 14B for the Multi-Token Prediction (MTP) module. The community actively develops MTP functionality, and contributions are welcome.
+4. Evaluation Results
+Base Model Benchmarks
+Chef_0.1.1 excels in various benchmarks, including:
+Benchmark (Metric)
+Shots
+COOK Protocol V2
+LLaMA 3.1 405B
+Chef_0.1.1
+English Pile-test
+-
+0.606
+0.542
+0.548
+MMLU (Accuracy)
+5-shot
+78.4
+84.4
+87.1
+DROP (F1)
+3-shot
+80.4
+86.0
+89.0
+Code HumanEval
+0-shot
+43.3
+54.9
+65.2
+Math MATH (EM)
+4-shot
+43.4
+49.0
+61.6
+For a full list of evaluation metrics, refer to our documentation on Hugging Face.
+5. Chat Website & API Platform
+Interact with Chef_0.1.1 directly:
+Chat: Visit the COOK Protocol chat interface: chat.cookprotocol.ai
+API Access: OpenAI-compatible API available on the COOK Platform: platform.cookprotocol.ai
+6. How to Run Locally
+Chef_0.1.1 supports various hardware configurations for seamless deployment. Key tools and methods include:
+Recommended Frameworks
+COOK-Infer Demo: Lightweight FP8 and BF16 inference.
+SGLang: Optimized latency and throughput, supporting FP8 and BF16 precision.
+LMDeploy: High-performance offline and online inference.
+Quick Start Example
+Clone the Chef_0.1.1 GitHub repository:
+git clone https://github.com/cook-protocol/Chef_0.1.1.git
+Navigate to the inference folder and install dependencies:
+cd Chef_0.1.1/inference
+pip install -r requirements.txt
+Run interactive inference:
+torchrun --nnodes 2 --nproc-per-node 8 generate.py --node-rank $RANK --master-addr $ADDR --ckpt-path /path/to/Chef_0.1.1 --config configs/config_671B.json --interactive --temperature 0.7 --max-new-tokens 200
+7. License
+Chef_0.1.1 is released under the apache-2.0, with commercial use permitted. For more details, refer to the COOK Protocol Model License.
+---