Skywork
/

Skywork-MoE-Base

@@ -37,7 +37,7 @@ We introduce two innovative techniques: Gating Logit Normalization, which enhanc
 Skywork-MoE demonstrates comparable or superior performance to models with more parameters or more activated parameters, such as Grok-1, DBRX, Mistral 8*22, and Deepseek-V2.
 # News and Updates
-* 2024.6.3  We release the **Skywork-MoE-base** model.
 # Table of contents
@@ -49,22 +49,15 @@ Skywork-MoE demonstrates comparable or superior performance to models with more
 - [🤝Contact Us and Citation](#Contact-Us-and-Citation)
-# Download URL
-|         | HuggingFace Model   |  ModelScope Model   |  Wisemodel Model  |
-|:-------:|:-----------:|:-----------------------------:|:-----------------------------:|
-| **Skywork-MoE-base**      | 🤗 [Skywork-MoE-base](https://github.com/SkyworkAI/Skywork-MoE)  | 🤖[Skywork-MoE-base](https://www.modelscope.cn/models/skywork/Skywork-MoE-base) | 👾[Skywork-MoE-base](https://wisemodel.cn/models/Skywork/Skywork-MoE-base) |
-| **Skywork-MoE-Base-FP8**  | 🤗 [Skywork-MoE-Base-FP8](https://github.com/SkyworkAI/Skywork-MoE) | 🤖 | 👾 |
 # Benchmark Results
-We evaluated Skywork-MoE-base model on various popular benchmarks, including C-Eval, MMLU, CMMLU, GSM8K, MATH and HumanEval.
 <img src="misc/skywork_moe_base_evaluation.png" alt="Image" width="600" height="280">
 # Demonstration of Hugging Face Model Inference
 ## Base Model Inference
-We can perform inference for the Skywork-MoE-base (16x13B size) model using HuggingFace on 8xA100/A800 or higher GPU hardware configurations.
 ```python
@@ -100,35 +93,23 @@ comming soon...
 ## Quickstart with vLLM
-We provide a method to quickly deploy the Skywork-Moe-base model based on vllm.
-Under fp8 precision you can run Skywork-Moe-base with just only 8*4090.
 You can get the source code in [`vllm`](https://github.com/SkyworkAI/vllm)
-You can get the fp8 model in [`Skywork-MoE-Base-FP8`](https://huggingface.co/Skywork/Skywork-MoE-Base-FP8)
 ### Based on local environment
-Since pytorch only supports 4090 using fp8 precision in the nightly version, you need to install the corresponding or newer version of pytorch.
-``` shell
-# for cuda12.1
-pip3 install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu121
-# for cuda12.4
-pip3 install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu124
-```
-Some other dependencies also need to be installed:
 ```shell
 pip3 install xformers vllm-flash-attn
 ```
-Then clone the [`vllm`](https://github.com/SkyworkAI/vllm) provided by skywork and change to `skywork-moe` branch:
 ``` shell
-git clone https://github.com/SkyworkAI/vllm.git -b skywork-moe
 cd vllm
 ```
@@ -138,7 +119,7 @@ Then compile and install vllm:
 MAX_JOBS=8 python3 setup.py install
 ```
-### Base on docker
 You can use the docker image provided by skywork to run vllm directly:
@@ -149,7 +130,7 @@ docker pull registry.cn-wulanchabu.aliyuncs.com/triple-mu/skywork-moe-vllm:v1
 Then start the container and set the model path and working directory.
 ```shell
-model_path="Skywork/Skywork-MoE-Base-FP8"
 workspace=${PWD}
 docker run \
@@ -162,19 +143,19 @@ docker run \
     --privileged=true \
     --ulimit stack=67108864 \
     --ipc=host \
-    -v ${model_path}:/Skywork-MoE-Base-FP8 \
     -v ${workspace}:/workspace \
     registry.cn-wulanchabu.aliyuncs.com/triple-mu/skywork-moe-vllm:v1
 ```
-Now, you can run the Skywork Moe base model for fun!
 ### Text Completion
 ``` python
 from vllm import LLM, SamplingParams
-model_path = '/path/to/skywork-moe-base'
 prompts = [
     "The president of the United States is",
     "The capital of France is",

 Skywork-MoE demonstrates comparable or superior performance to models with more parameters or more activated parameters, such as Grok-1, DBRX, Mistral 8*22, and Deepseek-V2.
 # News and Updates
+* 2024.6.3  We release the **Skywork-MoE-Base** model.
 # Table of contents
 - [🤝Contact Us and Citation](#Contact-Us-and-Citation)
 # Benchmark Results
+We evaluated Skywork-MoE-Base model on various popular benchmarks, including C-Eval, MMLU, CMMLU, GSM8K, MATH and HumanEval.
 <img src="misc/skywork_moe_base_evaluation.png" alt="Image" width="600" height="280">
 # Demonstration of Hugging Face Model Inference
 ## Base Model Inference
+We can perform inference for the Skywork-MoE-Base (16x13B size) model using HuggingFace on 8xA100/A800 or higher GPU hardware configurations.
 ```python
 ## Quickstart with vLLM
+We provide a method to quickly deploy the Skywork-MoE-Base model based on vllm.
 You can get the source code in [`vllm`](https://github.com/SkyworkAI/vllm)
 ### Based on local environment
+Some dependencies need to be installed:
 ```shell
 pip3 install xformers vllm-flash-attn
 ```
+Then clone the [`vllm`](https://github.com/SkyworkAI/vllm) provided by skywork:
 ``` shell
+git clone https://github.com/SkyworkAI/vllm.git
 cd vllm
 ```
 MAX_JOBS=8 python3 setup.py install
 ```
+### Based on docker
 You can use the docker image provided by skywork to run vllm directly:
 Then start the container and set the model path and working directory.
 ```shell
+model_path="Skywork/Skywork-MoE-Base"
 workspace=${PWD}
 docker run \
     --privileged=true \
     --ulimit stack=67108864 \
     --ipc=host \
+    -v ${model_path}:/Skywork-MoE-Base \
     -v ${workspace}:/workspace \
     registry.cn-wulanchabu.aliyuncs.com/triple-mu/skywork-moe-vllm:v1
 ```
+Now, you can run the Skywork-MoE-Base model for fun!
 ### Text Completion
 ``` python
 from vllm import LLM, SamplingParams
+model_path = 'Skywork/Skywork-MoE-Base'
 prompts = [
     "The president of the United States is",
     "The capital of France is",