Instructions to use openbmb/BitCPM-CANN-8B-unquantized with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openbmb/BitCPM-CANN-8B-unquantized with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="openbmb/BitCPM-CANN-8B-unquantized", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("openbmb/BitCPM-CANN-8B-unquantized", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use openbmb/BitCPM-CANN-8B-unquantized with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "openbmb/BitCPM-CANN-8B-unquantized" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/BitCPM-CANN-8B-unquantized", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/openbmb/BitCPM-CANN-8B-unquantized
- SGLang
How to use openbmb/BitCPM-CANN-8B-unquantized with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "openbmb/BitCPM-CANN-8B-unquantized" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/BitCPM-CANN-8B-unquantized", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "openbmb/BitCPM-CANN-8B-unquantized" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/BitCPM-CANN-8B-unquantized", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use openbmb/BitCPM-CANN-8B-unquantized with Docker Model Runner:
docker model run hf.co/openbmb/BitCPM-CANN-8B-unquantized
Upload folder using huggingface_hub
Browse files- example/README.md +60 -86
- example/gpu_pretrain.csv +51 -0
- example/gpu_pretrain_loss.png +0 -0
- example/gpu_sft.csv +51 -0
- example/gpu_sft_loss.png +0 -0
- example/npu_pretrain.csv +51 -0
- example/npu_pretrain_loss.png +0 -0
- example/npu_sft.csv +51 -0
- example/npu_sft_loss.png +0 -0
- example/run.sh +5 -4
- example/run_sft.sh +7 -5
example/README.md
CHANGED
|
@@ -1,12 +1,26 @@
|
|
| 1 |
-
# BitCPM4
|
| 2 |
|
| 3 |
-
This project provides scripts for continue pretraining **BitCPM4-CANN-1B-unquantized**.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
## Environment Setup
|
| 6 |
|
| 7 |
### Docker Image
|
| 8 |
|
| 9 |
-
Use the following Huawei NPU image:
|
| 10 |
|
| 11 |
```
|
| 12 |
swr.cn-south-1.myhuaweicloud.com/ascendhub/mindspeed-llm:openeuler22.03-mindspeed-llm-2.3.0-a3-arm
|
|
@@ -14,6 +28,8 @@ swr.cn-south-1.myhuaweicloud.com/ascendhub/mindspeed-llm:openeuler22.03-mindspee
|
|
| 14 |
|
| 15 |
Other Huawei NPU images may also work but have not been fully tested.
|
| 16 |
|
|
|
|
|
|
|
| 17 |
### Install Dependencies
|
| 18 |
|
| 19 |
After entering the container, install the Python dependencies:
|
|
@@ -22,24 +38,13 @@ After entering the container, install the Python dependencies:
|
|
| 22 |
pip install -r requirements.txt
|
| 23 |
```
|
| 24 |
|
| 25 |
-
|
| 26 |
|
| 27 |
-
|
| 28 |
-
| --- | --- |
|
| 29 |
-
| transformers | 4.46.3 |
|
| 30 |
-
| tokenizers | 0.20.3 |
|
| 31 |
-
| accelerate | 1.1.1 |
|
| 32 |
-
| deepspeed | 0.16.2 |
|
| 33 |
-
| datasets | 3.1.0 |
|
| 34 |
-
| safetensors | 0.4.5 |
|
| 35 |
-
| pyarrow | 17.0.0 |
|
| 36 |
-
| tensorboard | 2.18.0 |
|
| 37 |
-
|
| 38 |
-
## Dataset
|
| 39 |
|
| 40 |
The test dataset used is [C4-Pro](https://huggingface.co/datasets/gair-prox/c4-pro), stored in parquet format after downloading.
|
| 41 |
|
| 42 |
-
## Usage
|
| 43 |
|
| 44 |
Modify the path configuration in `run.sh`:
|
| 45 |
|
|
@@ -54,78 +59,47 @@ Then start training:
|
|
| 54 |
bash run.sh
|
| 55 |
```
|
| 56 |
|
| 57 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
|
| 59 |
## Training Results Reference
|
| 60 |
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
| Step | Loss | Learning Rate | Epoch |
|
| 64 |
-
| --- | --- | --- | --- |
|
| 65 |
-
| 2 | 2.7920 | 1.60e-06 | 0.01 |
|
| 66 |
-
| 4 | 2.8012 | 3.20e-06 | 0.02 |
|
| 67 |
-
| 6 | 2.7984 | 4.80e-06 | 0.03 |
|
| 68 |
-
| 8 | 2.7839 | 6.40e-06 | 0.04 |
|
| 69 |
-
| 10 | 2.8084 | 8.00e-06 | 0.05 |
|
| 70 |
-
| 12 | 2.8064 | 9.60e-06 | 0.06 |
|
| 71 |
-
| 14 | 2.7994 | 1.12e-05 | 0.07 |
|
| 72 |
-
| 16 | 2.7463 | 1.28e-05 | 0.08 |
|
| 73 |
-
| 18 | 2.7580 | 1.44e-05 | 0.09 |
|
| 74 |
-
| 20 | 2.8007 | 1.60e-05 | 0.10 |
|
| 75 |
-
| 22 | 2.8916 | 1.76e-05 | 0.12 |
|
| 76 |
-
| 24 | 2.8144 | 1.92e-05 | 0.13 |
|
| 77 |
-
| 26 | 2.7723 | 2.08e-05 | 0.14 |
|
| 78 |
-
| 28 | 2.7556 | 2.24e-05 | 0.15 |
|
| 79 |
-
| 30 | 2.7414 | 2.40e-05 | 0.16 |
|
| 80 |
-
| 32 | 2.7469 | 2.56e-05 | 0.17 |
|
| 81 |
-
| 34 | 2.7428 | 2.72e-05 | 0.18 |
|
| 82 |
-
| 36 | 2.7392 | 2.88e-05 | 0.19 |
|
| 83 |
-
| 38 | 2.7132 | 3.04e-05 | 0.20 |
|
| 84 |
-
| 40 | 2.7008 | 3.20e-05 | 0.21 |
|
| 85 |
-
| 42 | 2.7547 | 3.36e-05 | 0.22 |
|
| 86 |
-
| 44 | 2.7151 | 3.52e-05 | 0.23 |
|
| 87 |
-
| 46 | 2.7119 | 3.68e-05 | 0.24 |
|
| 88 |
-
| 48 | 2.7029 | 3.84e-05 | 0.25 |
|
| 89 |
-
| 50 | 2.6803 | 4.00e-05 | 0.26 |
|
| 90 |
-
| 52 | 2.6980 | 4.00e-05 | 0.27 |
|
| 91 |
-
| 54 | 2.6923 | 4.00e-05 | 0.28 |
|
| 92 |
-
| 56 | 2.7068 | 4.00e-05 | 0.29 |
|
| 93 |
-
| 58 | 2.6965 | 4.00e-05 | 0.30 |
|
| 94 |
-
| 60 | 2.7179 | 3.99e-05 | 0.31 |
|
| 95 |
-
| 62 | 2.7119 | 3.99e-05 | 0.32 |
|
| 96 |
-
| 64 | 2.7178 | 3.99e-05 | 0.33 |
|
| 97 |
-
| 66 | 2.7069 | 3.99e-05 | 0.35 |
|
| 98 |
-
| 68 | 2.6870 | 3.98e-05 | 0.36 |
|
| 99 |
-
| 70 | 2.6775 | 3.98e-05 | 0.37 |
|
| 100 |
-
| 72 | 2.7038 | 3.98e-05 | 0.38 |
|
| 101 |
-
| 74 | 2.6924 | 3.97e-05 | 0.39 |
|
| 102 |
-
| 76 | 2.7061 | 3.97e-05 | 0.40 |
|
| 103 |
-
| 78 | 2.6929 | 3.96e-05 | 0.41 |
|
| 104 |
-
| 80 | 2.6787 | 3.96e-05 | 0.42 |
|
| 105 |
-
| 82 | 2.6749 | 3.95e-05 | 0.43 |
|
| 106 |
-
| 84 | 2.6909 | 3.94e-05 | 0.44 |
|
| 107 |
-
| 86 | 2.6893 | 3.94e-05 | 0.45 |
|
| 108 |
-
| 88 | 2.6788 | 3.93e-05 | 0.46 |
|
| 109 |
-
| 90 | 2.6831 | 3.92e-05 | 0.47 |
|
| 110 |
-
| 92 | 2.7039 | 3.91e-05 | 0.48 |
|
| 111 |
-
| 94 | 2.6619 | 3.91e-05 | 0.49 |
|
| 112 |
-
| 96 | 2.6903 | 3.90e-05 | 0.50 |
|
| 113 |
-
| 98 | 2.6993 | 3.89e-05 | 0.51 |
|
| 114 |
-
| 100 | 2.6891 | 3.88e-05 | 0.52 |
|
| 115 |
-
| 102 | 2.6739 | 3.87e-05 | 0.53 |
|
| 116 |
-
|
| 117 |
-
> **Note:** BitCPM has its own training dataset and data mixture. It is expected that the loss continues to decrease when continue pretraining on open-source datasets.
|
| 118 |
-
|
| 119 |
-
As shown in the table, the loss gradually decreases from ~2.79 to ~2.67, indicating a stable training process and that the model is learning normally.
|
| 120 |
|
| 121 |
-
|
| 122 |
|
| 123 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 124 |
| --- | --- |
|
| 125 |
-
|
|
| 126 |
-
|
|
| 127 |
-
|
|
| 128 |
-
|
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
|
|
|
|
|
| 1 |
+
# BitCPM4 Training Example
|
| 2 |
|
| 3 |
+
This project provides scripts for continue pretraining (CPT) and supervised fine-tuning (SFT) of **BitCPM4-CANN-1B-unquantized**.
|
| 4 |
+
|
| 5 |
+
## File Description
|
| 6 |
+
|
| 7 |
+
CPT and SFT each have a pair of scripts (training script + launch script) and share DeepSpeed configuration files:
|
| 8 |
+
|
| 9 |
+
| File | Description |
|
| 10 |
+
| --- | --- |
|
| 11 |
+
| `run.sh` | Launch script for CPT with hyperparameter configuration |
|
| 12 |
+
| `run_sft.sh` | Launch script for SFT with hyperparameter configuration |
|
| 13 |
+
| `train.py` | Continue pretrain script based on HuggingFace Trainer + DeepSpeed |
|
| 14 |
+
| `train_sft.py` | Supervised fine-tuning script based on HuggingFace Trainer + DeepSpeed |
|
| 15 |
+
| `ds_config.json` | DeepSpeed ZeRO-3 configuration (with CPU offload) |
|
| 16 |
+
| `ds_config_z2.json` | DeepSpeed ZeRO-2 configuration (used by default) |
|
| 17 |
+
| `requirements.txt` | Python dependency list |
|
| 18 |
|
| 19 |
## Environment Setup
|
| 20 |
|
| 21 |
### Docker Image
|
| 22 |
|
| 23 |
+
Use the following Huawei NPU image on 910C:
|
| 24 |
|
| 25 |
```
|
| 26 |
swr.cn-south-1.myhuaweicloud.com/ascendhub/mindspeed-llm:openeuler22.03-mindspeed-llm-2.3.0-a3-arm
|
|
|
|
| 28 |
|
| 29 |
Other Huawei NPU images may also work but have not been fully tested.
|
| 30 |
|
| 31 |
+
For GPU environments, there are no special image requirements — just install `requirements.txt` directly.
|
| 32 |
+
|
| 33 |
### Install Dependencies
|
| 34 |
|
| 35 |
After entering the container, install the Python dependencies:
|
|
|
|
| 38 |
pip install -r requirements.txt
|
| 39 |
```
|
| 40 |
|
| 41 |
+
## Continue Pretrain (CPT)
|
| 42 |
|
| 43 |
+
### Dataset
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |
The test dataset used is [C4-Pro](https://huggingface.co/datasets/gair-prox/c4-pro), stored in parquet format after downloading.
|
| 46 |
|
| 47 |
+
### Usage
|
| 48 |
|
| 49 |
Modify the path configuration in `run.sh`:
|
| 50 |
|
|
|
|
| 59 |
bash run.sh
|
| 60 |
```
|
| 61 |
|
| 62 |
+
## Supervised Fine-Tuning (SFT)
|
| 63 |
+
|
| 64 |
+
### Dataset
|
| 65 |
+
|
| 66 |
+
The test dataset used is [UltraChat 200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k), stored in parquet format after downloading.
|
| 67 |
+
|
| 68 |
+
### Usage
|
| 69 |
+
|
| 70 |
+
Modify the path configuration in `run_sft.sh`:
|
| 71 |
+
|
| 72 |
+
```bash
|
| 73 |
+
MODEL_PATH="/path/to/BitCPM4-CANN-1B-unquantized/"
|
| 74 |
+
DATA_PATH="/path/to/ultrachat_200k/data/your_file.parquet"
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
+
Then start training:
|
| 78 |
+
|
| 79 |
+
```bash
|
| 80 |
+
bash run_sft.sh
|
| 81 |
+
```
|
| 82 |
|
| 83 |
## Training Results Reference
|
| 84 |
|
| 85 |
+
> **Note:** BitCPM has its own training dataset and data mixture. It is expected that the loss continues to decrease when training on open-source datasets.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 86 |
|
| 87 |
+
Below are the loss curves from smoke tests on GPU and NPU for both CPT and SFT tasks. The results are highly consistent across GPU and NPU, indicating that users can continue pre-training or fine-tuning on various compute devices:
|
| 88 |
|
| 89 |
+
| | GPU | NPU |
|
| 90 |
+
| --- | --- | --- |
|
| 91 |
+
| **CPT** |  |  |
|
| 92 |
+
| **SFT** |  |  |
|
| 93 |
+
|
| 94 |
+
Training log CSV files (corresponding to the loss curves above):
|
| 95 |
+
|
| 96 |
+
| CSV File | Corresponding Loss Curve |
|
| 97 |
| --- | --- |
|
| 98 |
+
| [gpu_pretrain.csv](gpu_pretrain.csv) | GPU CPT |
|
| 99 |
+
| [npu_pretrain.csv](npu_pretrain.csv) | NPU CPT |
|
| 100 |
+
| [gpu_sft.csv](gpu_sft.csv) | GPU SFT |
|
| 101 |
+
| [npu_sft.csv](npu_sft.csv) | NPU SFT |
|
| 102 |
+
|
| 103 |
+
---
|
| 104 |
+
|
| 105 |
+
These scripts provide a convenient, ready-to-use toolkit for QAT-aware continued pre-training and fine-tuning of BitCPM4-CANN models, so you can quickly adapt the model to your own data and tasks while preserving ternary quantization constraints.
|
example/gpu_pretrain.csv
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
step,train/loss,train/grad_norm,train/learning_rate,train/epoch,train/train_runtime,train/train_samples_per_second,train/train_steps_per_second,train/total_flos,train/train_loss
|
| 2 |
+
2,2.7920000553131104,0.03527498617768288,7.999999979801942e-06,0.010457516647875309,,,,,
|
| 3 |
+
4,2.8011999130249023,0.03495891019701958,1.5999999959603883e-05,0.020915033295750618,,,,,
|
| 4 |
+
6,2.7964000701904297,0.03271934762597084,2.4000000848900527e-05,0.0313725508749485,,,,,
|
| 5 |
+
8,2.763700008392334,0.024968057870864868,3.199999991920777e-05,0.041830066591501236,,,,,
|
| 6 |
+
10,3.281599998474121,0.31758183240890503,3.9999998989515007e-05,0.05228758230805397,,,,,
|
| 7 |
+
12,2.941200017929077,0.044055406004190445,3.995128281530924e-05,0.062745101749897,,,,,
|
| 8 |
+
14,2.851799964904785,0.03649706766009331,3.9805359847377986e-05,0.07320261746644974,,,,,
|
| 9 |
+
16,2.7869999408721924,0.022624235600233078,3.9562950405525044e-05,0.08366013318300247,,,,,
|
| 10 |
+
18,2.7825000286102295,0.021830420941114426,3.922523319488391e-05,0.0941176488995552,,,,,
|
| 11 |
+
20,2.7857000827789307,0.01685911975800991,3.87938525818754e-05,0.10457516461610794,,,,,
|
| 12 |
+
22,2.7571001052856445,0.01572061888873577,3.827090768027119e-05,0.11503268033266068,,,,,
|
| 13 |
+
24,2.762399911880493,0.016891509294509888,3.7658952351193875e-05,0.125490203499794,,,,,
|
| 14 |
+
26,2.7411000728607178,0.015683824196457863,3.6960962461307645e-05,0.13594771921634674,,,,,
|
| 15 |
+
28,2.733099937438965,0.012847283855080605,3.6180339520797133e-05,0.14640523493289948,,,,,
|
| 16 |
+
30,2.723400115966797,0.015209181234240532,3.532088885549456e-05,0.1568627506494522,,,,,
|
| 17 |
+
32,2.7342000007629395,0.01241038367152214,3.4386797779006884e-05,0.16732026636600494,,,,,
|
| 18 |
+
34,2.7321999073028564,0.012879018671810627,3.338261376484297e-05,0.17777778208255768,,,,,
|
| 19 |
+
36,2.7314000129699707,0.013242729939520359,3.231322989449836e-05,0.1882352977991104,,,,,
|
| 20 |
+
38,2.7065999507904053,0.01113435160368681,3.118385939160362e-05,0.19869281351566315,,,,,
|
| 21 |
+
40,2.6958999633789062,0.012413726188242435,2.9999999242136255e-05,0.20915032923221588,,,,,
|
| 22 |
+
42,2.7516000270843506,0.011661508120596409,2.8767422918463126e-05,0.21960784494876862,,,,,
|
| 23 |
+
44,2.713099956512451,0.012248368933796883,2.749213126662653e-05,0.23006536066532135,,,,,
|
| 24 |
+
46,2.7102999687194824,0.011450185440480709,2.6180339773418382e-05,0.24052287638187408,,,,,
|
| 25 |
+
48,2.7021000385284424,0.011155751533806324,2.483843854861334e-05,0.250980406999588,,,,,
|
| 26 |
+
50,2.680500030517578,0.010021247901022434,2.3472963221138343e-05,0.26143792271614075,,,,,
|
| 27 |
+
52,2.699199914932251,0.010751751251518726,2.2090569473220967e-05,0.2718954384326935,,,,,
|
| 28 |
+
54,2.694200038909912,0.010503941215574741,2.0697989384643734e-05,0.2823529541492462,,,,,
|
| 29 |
+
56,2.7091000080108643,0.010059370659291744,1.9302009604871273e-05,0.29281046986579895,,,,,
|
| 30 |
+
58,2.699399948120117,0.012161476537585258,1.7909431335283443e-05,0.3032679855823517,,,,,
|
| 31 |
+
60,2.7216999530792236,0.010671027936041355,1.6527035768376663e-05,0.3137255012989044,,,,,
|
| 32 |
+
62,2.7158000469207764,0.010463157668709755,1.516156225989107e-05,0.32418301701545715,,,,,
|
| 33 |
+
64,2.7214999198913574,0.010665320791304111,1.3819660125591327e-05,0.3346405327320099,,,,,
|
| 34 |
+
66,2.7116000652313232,0.01046629250049591,1.2507867722888477e-05,0.3450980484485626,,,,,
|
| 35 |
+
68,2.6923000812530518,0.010609752498567104,1.1232576980546582e-05,0.35555556416511536,,,,,
|
| 36 |
+
70,2.6830999851226807,0.009290814399719238,9.999999747378752e-06,0.3660130798816681,,,,,
|
| 37 |
+
72,2.7093000411987305,0.010727670043706894,8.816142326395493e-06,0.3764705955982208,,,,,
|
| 38 |
+
74,2.698699951171875,0.0109737953171134,7.686770914006047e-06,0.38692811131477356,,,,,
|
| 39 |
+
76,2.712599992752075,0.010320967063307762,6.61738795315614e-06,0.3973856270313263,,,,,
|
| 40 |
+
78,2.6993000507354736,0.009841523133218288,5.613203938992228e-06,0.40784314274787903,,,,,
|
| 41 |
+
80,2.6861000061035156,0.010179675184190273,4.6791110435151495e-06,0.41830065846443176,,,,,
|
| 42 |
+
82,2.6828999519348145,0.009790077805519104,3.819659923465224e-06,0.4287581741809845,,,,,
|
| 43 |
+
84,2.699199914932251,0.010508442297577858,3.03903811982309e-06,0.43921568989753723,,,,,
|
| 44 |
+
86,2.6988000869750977,0.009589221328496933,2.3410482299368596e-06,0.44967320561408997,,,,,
|
| 45 |
+
88,2.688499927520752,0.010065913200378418,1.7290908544964623e-06,0.4601307213306427,,,,,
|
| 46 |
+
90,2.6928999423980713,0.010363687761127949,1.206147544507985e-06,0.47058823704719543,,,,,
|
| 47 |
+
92,2.714200019836426,0.010142815299332142,7.74766078848188e-07,0.48104575276374817,,,,,
|
| 48 |
+
94,2.672300100326538,0.009833029471337795,4.370479871340649e-07,0.4915032684803009,,,,,
|
| 49 |
+
96,2.7018001079559326,0.009937037713825703,1.9463863054625108e-07,0.501960813999176,,,,,
|
| 50 |
+
98,2.7121999263763428,0.009417451918125153,4.8718995060426096e-08,0.5124183297157288,,,,,
|
| 51 |
+
100,2.7028000354766846,0.009256146848201752,0.0,0.5228758454322815,365.8839111328125,139.93499755859375,0.27300000190734863,4.629706395531346e+17,2.7395541667938232
|
example/gpu_pretrain_loss.png
ADDED
|
example/gpu_sft.csv
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
step,train/loss,train/grad_norm,train/learning_rate,train/epoch,train/train_runtime,train/train_samples_per_second,train/train_steps_per_second,train/total_flos,train/train_loss
|
| 2 |
+
2,1.1492999792099,0.6216375231742859,1.9999999949504854e-06,0.0004617871018126607,,,,,
|
| 3 |
+
4,1.0979000329971313,0.681877851486206,3.999999989900971e-06,0.0009235742036253214,,,,,
|
| 4 |
+
6,1.1269999742507935,0.784303605556488,6.000000212225132e-06,0.001385361305437982,,,,,
|
| 5 |
+
8,1.0542000532150269,0.8737029433250427,7.999999979801942e-06,0.0018471484072506428,,,,,
|
| 6 |
+
10,1.2440999746322632,0.7068291902542114,9.999999747378752e-06,0.0023089356254786253,,,,,
|
| 7 |
+
12,1.2925000190734863,0.6821666955947876,1.2000000424450263e-05,0.002770722610875964,,,,,
|
| 8 |
+
14,1.0843000411987305,0.525643527507782,1.4000000192027073e-05,0.0032325098291039467,,,,,
|
| 9 |
+
16,1.0961999893188477,0.43757057189941406,1.5999999959603883e-05,0.0036942968145012856,,,,,
|
| 10 |
+
18,1.0614999532699585,0.46141618490219116,1.8000000636675395e-05,0.004156084265559912,,,,,
|
| 11 |
+
20,1.332900047302246,0.715879499912262,1.9999999494757503e-05,0.004617871250957251,,,,,
|
| 12 |
+
22,1.2070000171661377,0.5926885008811951,1.996917308133561e-05,0.0050796582363545895,,,,,
|
| 13 |
+
24,1.2043999433517456,0.5833240747451782,1.9876883015967906e-05,0.005541445221751928,,,,,
|
| 14 |
+
26,1.0740000009536743,0.44734400510787964,1.9723698642337695e-05,0.0060032326728105545,,,,,
|
| 15 |
+
28,1.1162999868392944,0.3701137900352478,1.9510565834934823e-05,0.006465019658207893,,,,,
|
| 16 |
+
30,1.0454000234603882,0.43832680583000183,1.9238796085119247e-05,0.006926806643605232,,,,,
|
| 17 |
+
32,1.124899983406067,0.4591037631034851,1.8910064682131633e-05,0.007388593629002571,,,,,
|
| 18 |
+
34,1.0686999559402466,0.3873400390148163,1.8526401618146338e-05,0.00785038061439991,,,,,
|
| 19 |
+
36,1.0291999578475952,0.40313437581062317,1.8090169760398567e-05,0.008312168531119823,,,,,
|
| 20 |
+
38,1.1052000522613525,0.3735405504703522,1.7604059394216165e-05,0.008773955516517162,,,,,
|
| 21 |
+
40,1.1555999517440796,0.3818407654762268,1.7071068214136176e-05,0.009235742501914501,,,,,
|
| 22 |
+
42,1.0235999822616577,0.4255191683769226,1.6494481315021403e-05,0.00969752948731184,,,,,
|
| 23 |
+
44,1.0364999771118164,0.4794503152370453,1.5877853002166376e-05,0.010159316472709179,,,,,
|
| 24 |
+
46,1.1344000101089478,0.37273937463760376,1.5224985872919206e-05,0.010621103458106518,,,,,
|
| 25 |
+
48,1.0866999626159668,0.417492538690567,1.453990535082994e-05,0.011082890443503857,,,,,
|
| 26 |
+
50,1.1038000583648682,0.35408055782318115,1.3826834219798911e-05,0.01154467836022377,,,,,
|
| 27 |
+
52,1.1478999853134155,0.3930828273296356,1.3090169886709191e-05,0.012006465345621109,,,,,
|
| 28 |
+
54,1.1858999729156494,0.3965947926044464,1.2334453458606731e-05,0.012468252331018448,,,,,
|
| 29 |
+
56,1.0096999406814575,0.3860221207141876,1.1564344276848715e-05,0.012930039316415787,,,,,
|
| 30 |
+
58,1.114799976348877,0.44393691420555115,1.0784590813273098e-05,0.013391826301813126,,,,,
|
| 31 |
+
60,1.079300045967102,0.3605058789253235,9.999999747378752e-06,0.013853613287210464,,,,,
|
| 32 |
+
62,1.1766999959945679,0.40689122676849365,9.215408681484405e-06,0.014315400272607803,,,,,
|
| 33 |
+
64,1.1075999736785889,0.4002344310283661,8.435655217908788e-06,0.014777187258005142,,,,,
|
| 34 |
+
66,1.1866999864578247,0.46947163343429565,7.665546036150772e-06,0.015238975174725056,,,,,
|
| 35 |
+
68,1.0311000347137451,0.3296957314014435,6.909830062795663e-06,0.01570076122879982,,,,,
|
| 36 |
+
70,1.1088999509811401,0.33858785033226013,6.173165729705943e-06,0.01616254821419716,,,,,
|
| 37 |
+
72,1.0720000267028809,0.3967427909374237,5.460095053422265e-06,0.016624337062239647,,,,,
|
| 38 |
+
74,1.1460000276565552,0.41202062368392944,4.7750145313329995e-06,0.017086124047636986,,,,,
|
| 39 |
+
76,1.0425000190734863,0.38334518671035767,4.1221474020858295e-06,0.017547911033034325,,,,,
|
| 40 |
+
78,0.9154000282287598,0.40649303793907166,3.505519543978153e-06,0.018009698018431664,,,,,
|
| 41 |
+
80,1.1110999584197998,0.35371580719947815,2.9289321901160292e-06,0.018471485003829002,,,,,
|
| 42 |
+
82,1.1672999858856201,0.3381657302379608,2.3959403279150138e-06,0.01893327198922634,,,,,
|
| 43 |
+
84,1.2374000549316406,0.3815234303474426,1.909829961732612e-06,0.01939505897462368,,,,,
|
| 44 |
+
86,1.2151000499725342,0.38446080684661865,1.4735983313585166e-06,0.01985684596002102,,,,,
|
| 45 |
+
88,1.163100004196167,0.40419140458106995,1.0899348126258701e-06,0.020318632945418358,,,,,
|
| 46 |
+
90,1.1883000135421753,0.4011874198913574,7.612046601934708e-07,0.020780419930815697,,,,,
|
| 47 |
+
92,1.1526999473571777,0.3836020231246948,4.894348535344761e-07,0.021242206916213036,,,,,
|
| 48 |
+
94,1.15339994430542,0.452364057302475,2.7630079557638965e-07,0.021703993901610374,,,,,
|
| 49 |
+
96,1.062000036239624,0.3502688705921173,1.2311659247643547e-07,0.022165780887007713,,,,,
|
| 50 |
+
98,1.0271999835968018,0.4022065997123718,3.0826662111849146e-08,0.022627567872405052,,,,,
|
| 51 |
+
100,1.0283000469207764,0.38241174817085266,0.0,0.02308935672044754,183.9481964111328,8.697999954223633,0.5440000295639038,1862467846144.0,1.1177252531051636
|
example/gpu_sft_loss.png
ADDED
|
example/npu_pretrain.csv
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
step,train/loss,train/grad_norm,train/learning_rate,train/epoch,train/train_runtime,train/train_samples_per_second,train/train_steps_per_second,train/total_flos,train/train_loss
|
| 2 |
+
2,2.7920000553131104,0.035306449979543686,7.999999979801942e-06,0.010457516647875309,,,,,
|
| 3 |
+
4,2.8011999130249023,0.03491510450839996,1.5999999959603883e-05,0.020915033295750618,,,,,
|
| 4 |
+
6,2.7964000701904297,0.032717395573854446,2.4000000848900527e-05,0.0313725508749485,,,,,
|
| 5 |
+
8,2.763700008392334,0.024953875690698624,3.199999991920777e-05,0.041830066591501236,,,,,
|
| 6 |
+
10,3.2811999320983887,0.3170815408229828,3.9999998989515007e-05,0.05228758230805397,,,,,
|
| 7 |
+
12,2.9409000873565674,0.04423849284648895,3.995128281530924e-05,0.062745101749897,,,,,
|
| 8 |
+
14,2.851900100708008,0.03667925298213959,3.9805359847377986e-05,0.07320261746644974,,,,,
|
| 9 |
+
16,2.7869999408721924,0.022814607247710228,3.9562950405525044e-05,0.08366013318300247,,,,,
|
| 10 |
+
18,2.782599925994873,0.021528413519263268,3.922523319488391e-05,0.0941176488995552,,,,,
|
| 11 |
+
20,2.785599946975708,0.017014438286423683,3.87938525818754e-05,0.10457516461610794,,,,,
|
| 12 |
+
22,2.7571001052856445,0.015719758346676826,3.827090768027119e-05,0.11503268033266068,,,,,
|
| 13 |
+
24,2.762399911880493,0.016948623582720757,3.7658952351193875e-05,0.125490203499794,,,,,
|
| 14 |
+
26,2.7411000728607178,0.015535997226834297,3.6960962461307645e-05,0.13594771921634674,,,,,
|
| 15 |
+
28,2.7330000400543213,0.012748735956847668,3.6180339520797133e-05,0.14640523493289948,,,,,
|
| 16 |
+
30,2.723299980163574,0.014809778891503811,3.532088885549456e-05,0.1568627506494522,,,,,
|
| 17 |
+
32,2.7342000007629395,0.01219236571341753,3.4386797779006884e-05,0.16732026636600494,,,,,
|
| 18 |
+
34,2.7321999073028564,0.012785322032868862,3.338261376484297e-05,0.17777778208255768,,,,,
|
| 19 |
+
36,2.7314000129699707,0.012986919842660427,3.231322989449836e-05,0.1882352977991104,,,,,
|
| 20 |
+
38,2.7065999507904053,0.01096824835985899,3.118385939160362e-05,0.19869281351566315,,,,,
|
| 21 |
+
40,2.6958999633789062,0.012387535534799099,2.9999999242136255e-05,0.20915032923221588,,,,,
|
| 22 |
+
42,2.751499891281128,0.011586200445890427,2.8767422918463126e-05,0.21960784494876862,,,,,
|
| 23 |
+
44,2.713099956512451,0.011821281164884567,2.749213126662653e-05,0.23006536066532135,,,,,
|
| 24 |
+
46,2.7102999687194824,0.01147585827857256,2.6180339773418382e-05,0.24052287638187408,,,,,
|
| 25 |
+
48,2.7019999027252197,0.011368263512849808,2.483843854861334e-05,0.250980406999588,,,,,
|
| 26 |
+
50,2.680500030517578,0.009935515932738781,2.3472963221138343e-05,0.26143792271614075,,,,,
|
| 27 |
+
52,2.6993000507354736,0.0109846917912364,2.2090569473220967e-05,0.2718954384326935,,,,,
|
| 28 |
+
54,2.6940999031066895,0.010465175844728947,2.0697989384643734e-05,0.2823529541492462,,,,,
|
| 29 |
+
56,2.7091000080108643,0.01009758748114109,1.9302009604871273e-05,0.29281046986579895,,,,,
|
| 30 |
+
58,2.69950008392334,0.01249368954449892,1.7909431335283443e-05,0.3032679855823517,,,,,
|
| 31 |
+
60,2.7216999530792236,0.01051376760005951,1.6527035768376663e-05,0.3137255012989044,,,,,
|
| 32 |
+
62,2.7158000469207764,0.01054943073540926,1.516156225989107e-05,0.32418301701545715,,,,,
|
| 33 |
+
64,2.7214999198913574,0.01076149195432663,1.3819660125591327e-05,0.3346405327320099,,,,,
|
| 34 |
+
66,2.7116000652313232,0.010380392894148827,1.2507867722888477e-05,0.3450980484485626,,,,,
|
| 35 |
+
68,2.6923000812530518,0.010425001382827759,1.1232576980546582e-05,0.35555556416511536,,,,,
|
| 36 |
+
70,2.683199882507324,0.00925016961991787,9.999999747378752e-06,0.3660130798816681,,,,,
|
| 37 |
+
72,2.7093000411987305,0.01072422880679369,8.816142326395493e-06,0.3764705955982208,,,,,
|
| 38 |
+
74,2.6988000869750977,0.011063243262469769,7.686770914006047e-06,0.38692811131477356,,,,,
|
| 39 |
+
76,2.7125000953674316,0.01013101264834404,6.61738795315614e-06,0.3973856270313263,,,,,
|
| 40 |
+
78,2.6993000507354736,0.009940676391124725,5.613203938992228e-06,0.40784314274787903,,,,,
|
| 41 |
+
80,2.6861000061035156,0.01050259917974472,4.6791110435151495e-06,0.41830065846443176,,,,,
|
| 42 |
+
82,2.6828999519348145,0.009912634268403053,3.819659923465224e-06,0.4287581741809845,,,,,
|
| 43 |
+
84,2.699199914932251,0.010668900795280933,3.03903811982309e-06,0.43921568989753723,,,,,
|
| 44 |
+
86,2.698899984359741,0.009650414809584618,2.3410482299368596e-06,0.44967320561408997,,,,,
|
| 45 |
+
88,2.6884000301361084,0.01006452739238739,1.7290908544964623e-06,0.4601307213306427,,,,,
|
| 46 |
+
90,2.6928999423980713,0.010409764014184475,1.206147544507985e-06,0.47058823704719543,,,,,
|
| 47 |
+
92,2.714200019836426,0.009937116876244545,7.74766078848188e-07,0.48104575276374817,,,,,
|
| 48 |
+
94,2.672300100326538,0.009728306904435158,4.370479871340649e-07,0.4915032684803009,,,,,
|
| 49 |
+
96,2.7018001079559326,0.010098566301167011,1.9463863054625108e-07,0.501960813999176,,,,,
|
| 50 |
+
98,2.7123000621795654,0.009524320252239704,4.8718995060426096e-08,0.5124183297157288,,,,,
|
| 51 |
+
100,2.7028000354766846,0.009290286339819431,0.0,0.5228758454322815,788.0635986328125,64.96900177001953,0.12700000405311584,4.629706395531346e+17,2.739542245864868
|
example/npu_pretrain_loss.png
ADDED
|
example/npu_sft.csv
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
step,train/loss,train/grad_norm,train/learning_rate,train/epoch,train/train_runtime,train/train_samples_per_second,train/train_steps_per_second,train/total_flos,train/train_loss
|
| 2 |
+
2,1.1491999626159668,0.6218180060386658,1.9999999949504854e-06,0.0004617871018126607,,,,,
|
| 3 |
+
4,1.0981999635696411,0.6825665235519409,3.999999989900971e-06,0.0009235742036253214,,,,,
|
| 4 |
+
6,1.1269999742507935,0.7838642001152039,6.000000212225132e-06,0.001385361305437982,,,,,
|
| 5 |
+
8,1.0542000532150269,0.8744276762008667,7.999999979801942e-06,0.0018471484072506428,,,,,
|
| 6 |
+
10,1.2441999912261963,0.7064258456230164,9.999999747378752e-06,0.0023089356254786253,,,,,
|
| 7 |
+
12,1.2927000522613525,0.6829814910888672,1.2000000424450263e-05,0.002770722610875964,,,,,
|
| 8 |
+
14,1.0844999551773071,0.5265647172927856,1.4000000192027073e-05,0.0032325098291039467,,,,,
|
| 9 |
+
16,1.0963000059127808,0.4373657703399658,1.5999999959603883e-05,0.0036942968145012856,,,,,
|
| 10 |
+
18,1.0615999698638916,0.46220508217811584,1.8000000636675395e-05,0.004156084265559912,,,,,
|
| 11 |
+
20,1.3325999975204468,0.7157824039459229,1.9999999494757503e-05,0.004617871250957251,,,,,
|
| 12 |
+
22,1.2070000171661377,0.5933427214622498,1.996917308133561e-05,0.0050796582363545895,,,,,
|
| 13 |
+
24,1.2044999599456787,0.5816172957420349,1.9876883015967906e-05,0.005541445221751928,,,,,
|
| 14 |
+
26,1.0740000009536743,0.4489712119102478,1.9723698642337695e-05,0.0060032326728105545,,,,,
|
| 15 |
+
28,1.1164000034332275,0.3696516752243042,1.9510565834934823e-05,0.006465019658207893,,,,,
|
| 16 |
+
30,1.045199990272522,0.4376335144042969,1.9238796085119247e-05,0.006926806643605232,,,,,
|
| 17 |
+
32,1.1247999668121338,0.4589230716228485,1.8910064682131633e-05,0.007388593629002571,,,,,
|
| 18 |
+
34,1.0688999891281128,0.3879022002220154,1.8526401618146338e-05,0.00785038061439991,,,,,
|
| 19 |
+
36,1.0292999744415283,0.4027869403362274,1.8090169760398567e-05,0.008312168531119823,,,,,
|
| 20 |
+
38,1.1052000522613525,0.37394437193870544,1.7604059394216165e-05,0.008773955516517162,,,,,
|
| 21 |
+
40,1.1557999849319458,0.3808683753013611,1.7071068214136176e-05,0.009235742501914501,,,,,
|
| 22 |
+
42,1.0232000350952148,0.4252733886241913,1.6494481315021403e-05,0.00969752948731184,,,,,
|
| 23 |
+
44,1.0364999771118164,0.48068660497665405,1.5877853002166376e-05,0.010159316472709179,,,,,
|
| 24 |
+
46,1.1340999603271484,0.37313926219940186,1.5224985872919206e-05,0.010621103458106518,,,,,
|
| 25 |
+
48,1.0866999626159668,0.4175492823123932,1.453990535082994e-05,0.011082890443503857,,,,,
|
| 26 |
+
50,1.1039999723434448,0.35443660616874695,1.3826834219798911e-05,0.01154467836022377,,,,,
|
| 27 |
+
52,1.1480000019073486,0.39232146739959717,1.3090169886709191e-05,0.012006465345621109,,,,,
|
| 28 |
+
54,1.1861000061035156,0.396918922662735,1.2334453458606731e-05,0.012468252331018448,,,,,
|
| 29 |
+
56,1.0096999406814575,0.3885609209537506,1.1564344276848715e-05,0.012930039316415787,,,,,
|
| 30 |
+
58,1.114799976348877,0.4421806335449219,1.0784590813273098e-05,0.013391826301813126,,,,,
|
| 31 |
+
60,1.0795999765396118,0.36081990599632263,9.999999747378752e-06,0.013853613287210464,,,,,
|
| 32 |
+
62,1.1764999628067017,0.4062329828739166,9.215408681484405e-06,0.014315400272607803,,,,,
|
| 33 |
+
64,1.107200026512146,0.39982733130455017,8.435655217908788e-06,0.014777187258005142,,,,,
|
| 34 |
+
66,1.1868000030517578,0.4688170254230499,7.665546036150772e-06,0.015238975174725056,,,,,
|
| 35 |
+
68,1.0312999486923218,0.3301626741886139,6.909830062795663e-06,0.01570076122879982,,,,,
|
| 36 |
+
70,1.1089999675750732,0.3377252221107483,6.173165729705943e-06,0.01616254821419716,,,,,
|
| 37 |
+
72,1.0716999769210815,0.39666977524757385,5.460095053422265e-06,0.016624337062239647,,,,,
|
| 38 |
+
74,1.1461999416351318,0.4125552177429199,4.7750145313329995e-06,0.017086124047636986,,,,,
|
| 39 |
+
76,1.042199969291687,0.3825180232524872,4.1221474020858295e-06,0.017547911033034325,,,,,
|
| 40 |
+
78,0.9157000184059143,0.4063441753387451,3.505519543978153e-06,0.018009698018431664,,,,,
|
| 41 |
+
80,1.1110999584197998,0.35289037227630615,2.9289321901160292e-06,0.018471485003829002,,,,,
|
| 42 |
+
82,1.167199969291687,0.33720290660858154,2.3959403279150138e-06,0.01893327198922634,,,,,
|
| 43 |
+
84,1.2375999689102173,0.38099613785743713,1.909829961732612e-06,0.01939505897462368,,,,,
|
| 44 |
+
86,1.2151999473571777,0.3848689794540405,1.4735983313585166e-06,0.01985684596002102,,,,,
|
| 45 |
+
88,1.1628999710083008,0.40408074855804443,1.0899348126258701e-06,0.020318632945418358,,,,,
|
| 46 |
+
90,1.1884000301361084,0.4015007019042969,7.612046601934708e-07,0.020780419930815697,,,,,
|
| 47 |
+
92,1.152500033378601,0.38306349515914917,4.894348535344761e-07,0.021242206916213036,,,,,
|
| 48 |
+
94,1.154099941253662,0.45273807644844055,2.7630079557638965e-07,0.021703993901610374,,,,,
|
| 49 |
+
96,1.0618000030517578,0.35036078095436096,1.2311659247643547e-07,0.022165780887007713,,,,,
|
| 50 |
+
98,1.0270999670028687,0.40208569169044495,3.0826662111849146e-08,0.022627567872405052,,,,,
|
| 51 |
+
100,1.0285999774932861,0.38247284293174744,0.0,0.02308935672044754,728.7083129882812,2.196000099182129,0.13699999451637268,1862467846144.0,1.117748498916626
|
example/npu_sft_loss.png
ADDED
|
example/run.sh
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
#!/bin/bash
|
| 2 |
|
| 3 |
-
MODEL_PATH="/model/
|
| 4 |
DATA_PATH="/dataset/c4-pro/data/000_1_7.parquet"
|
| 5 |
OUTPUT_DIR="./output"
|
| 6 |
DS_CONFIG="./ds_config_z2.json"
|
|
@@ -11,7 +11,8 @@ GRAD_ACCUM_STEPS=8
|
|
| 11 |
MAX_SEQ_LENGTH=1024
|
| 12 |
|
| 13 |
export ASCEND_RT_VISIBLE_DEVICES=8,9,10,11,12,13,14,15
|
| 14 |
-
|
|
|
|
| 15 |
torchrun --nproc_per_node=$NUM_GPUS train.py \
|
| 16 |
--model_name_or_path $MODEL_PATH \
|
| 17 |
--data_path $DATA_PATH \
|
|
@@ -19,7 +20,7 @@ torchrun --nproc_per_node=$NUM_GPUS train.py \
|
|
| 19 |
--output_dir $OUTPUT_DIR \
|
| 20 |
--per_device_train_batch_size $BATCH_SIZE_PER_GPU \
|
| 21 |
--gradient_accumulation_steps $GRAD_ACCUM_STEPS \
|
| 22 |
-
--max_steps
|
| 23 |
--learning_rate 4e-5 \
|
| 24 |
--lr_scheduler_type cosine \
|
| 25 |
--warmup_ratio 0.1 \
|
|
@@ -33,5 +34,5 @@ torchrun --nproc_per_node=$NUM_GPUS train.py \
|
|
| 33 |
--seed 42 \
|
| 34 |
--dataloader_num_workers 4 \
|
| 35 |
--report_to tensorboard \
|
| 36 |
-
--logging_dir /data/tensorboard/ \
|
| 37 |
--gradient_checkpointing_kwargs '{"use_reentrant": false}'
|
|
|
|
| 1 |
#!/bin/bash
|
| 2 |
|
| 3 |
+
MODEL_PATH="/model/BitCPM4-CANN-1B-unquantized"
|
| 4 |
DATA_PATH="/dataset/c4-pro/data/000_1_7.parquet"
|
| 5 |
OUTPUT_DIR="./output"
|
| 6 |
DS_CONFIG="./ds_config_z2.json"
|
|
|
|
| 11 |
MAX_SEQ_LENGTH=1024
|
| 12 |
|
| 13 |
export ASCEND_RT_VISIBLE_DEVICES=8,9,10,11,12,13,14,15
|
| 14 |
+
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
|
| 15 |
+
export DS_SKIP_CUDA_CHECK=1
|
| 16 |
torchrun --nproc_per_node=$NUM_GPUS train.py \
|
| 17 |
--model_name_or_path $MODEL_PATH \
|
| 18 |
--data_path $DATA_PATH \
|
|
|
|
| 20 |
--output_dir $OUTPUT_DIR \
|
| 21 |
--per_device_train_batch_size $BATCH_SIZE_PER_GPU \
|
| 22 |
--gradient_accumulation_steps $GRAD_ACCUM_STEPS \
|
| 23 |
+
--max_steps 100 \
|
| 24 |
--learning_rate 4e-5 \
|
| 25 |
--lr_scheduler_type cosine \
|
| 26 |
--warmup_ratio 0.1 \
|
|
|
|
| 34 |
--seed 42 \
|
| 35 |
--dataloader_num_workers 4 \
|
| 36 |
--report_to tensorboard \
|
| 37 |
+
--logging_dir /data/tensorboard/pretrain \
|
| 38 |
--gradient_checkpointing_kwargs '{"use_reentrant": false}'
|
example/run_sft.sh
CHANGED
|
@@ -1,16 +1,18 @@
|
|
| 1 |
#!/bin/bash
|
| 2 |
|
| 3 |
-
MODEL_PATH="/model/
|
| 4 |
-
DATA_PATH=""
|
| 5 |
OUTPUT_DIR="./output_sft"
|
| 6 |
DS_CONFIG="./ds_config.json"
|
| 7 |
|
| 8 |
NUM_GPUS=8
|
| 9 |
BATCH_SIZE_PER_GPU=2
|
| 10 |
GRAD_ACCUM_STEPS=1
|
| 11 |
-
MAX_SEQ_LENGTH=
|
| 12 |
|
| 13 |
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
|
|
|
|
|
|
|
| 14 |
|
| 15 |
torchrun --nproc_per_node=$NUM_GPUS train_sft.py \
|
| 16 |
--model_name_or_path $MODEL_PATH \
|
|
@@ -19,10 +21,10 @@ torchrun --nproc_per_node=$NUM_GPUS train_sft.py \
|
|
| 19 |
--output_dir $OUTPUT_DIR \
|
| 20 |
--per_device_train_batch_size $BATCH_SIZE_PER_GPU \
|
| 21 |
--gradient_accumulation_steps $GRAD_ACCUM_STEPS \
|
| 22 |
-
--
|
| 23 |
--learning_rate 2e-5 \
|
| 24 |
--lr_scheduler_type cosine \
|
| 25 |
-
--warmup_ratio 0.
|
| 26 |
--weight_decay 0.0 \
|
| 27 |
--logging_steps 2 \
|
| 28 |
--save_steps 500 \
|
|
|
|
| 1 |
#!/bin/bash
|
| 2 |
|
| 3 |
+
MODEL_PATH="/model/BitCPM4-CANN-1B-unquantized"
|
| 4 |
+
DATA_PATH="/dataset/HuggingFaceH4_ultrachat_200k/data/train_sft-00000-of-00003-a3ecf92756993583.parquet"
|
| 5 |
OUTPUT_DIR="./output_sft"
|
| 6 |
DS_CONFIG="./ds_config.json"
|
| 7 |
|
| 8 |
NUM_GPUS=8
|
| 9 |
BATCH_SIZE_PER_GPU=2
|
| 10 |
GRAD_ACCUM_STEPS=1
|
| 11 |
+
MAX_SEQ_LENGTH=8192
|
| 12 |
|
| 13 |
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
|
| 14 |
+
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
|
| 15 |
+
export DS_SKIP_CUDA_CHECK=1
|
| 16 |
|
| 17 |
torchrun --nproc_per_node=$NUM_GPUS train_sft.py \
|
| 18 |
--model_name_or_path $MODEL_PATH \
|
|
|
|
| 21 |
--output_dir $OUTPUT_DIR \
|
| 22 |
--per_device_train_batch_size $BATCH_SIZE_PER_GPU \
|
| 23 |
--gradient_accumulation_steps $GRAD_ACCUM_STEPS \
|
| 24 |
+
--max_steps 100 \
|
| 25 |
--learning_rate 2e-5 \
|
| 26 |
--lr_scheduler_type cosine \
|
| 27 |
+
--warmup_ratio 0.2 \
|
| 28 |
--weight_decay 0.0 \
|
| 29 |
--logging_steps 2 \
|
| 30 |
--save_steps 500 \
|