guanwenyu1995 commited on
Commit
6e593d2
·
verified ·
1 Parent(s): c005499

Upload folder using huggingface_hub

Browse files
example/README.md CHANGED
@@ -1,12 +1,26 @@
1
- # BitCPM4 Continue Pretrain Example
2
 
3
- This project provides scripts for continue pretraining **BitCPM4-CANN-1B-unquantized**.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
  ## Environment Setup
6
 
7
  ### Docker Image
8
 
9
- Use the following Huawei NPU image:
10
 
11
  ```
12
  swr.cn-south-1.myhuaweicloud.com/ascendhub/mindspeed-llm:openeuler22.03-mindspeed-llm-2.3.0-a3-arm
@@ -14,6 +28,8 @@ swr.cn-south-1.myhuaweicloud.com/ascendhub/mindspeed-llm:openeuler22.03-mindspee
14
 
15
  Other Huawei NPU images may also work but have not been fully tested.
16
 
 
 
17
  ### Install Dependencies
18
 
19
  After entering the container, install the Python dependencies:
@@ -22,24 +38,13 @@ After entering the container, install the Python dependencies:
22
  pip install -r requirements.txt
23
  ```
24
 
25
- Dependency list:
26
 
27
- | Package | Version |
28
- | --- | --- |
29
- | transformers | 4.46.3 |
30
- | tokenizers | 0.20.3 |
31
- | accelerate | 1.1.1 |
32
- | deepspeed | 0.16.2 |
33
- | datasets | 3.1.0 |
34
- | safetensors | 0.4.5 |
35
- | pyarrow | 17.0.0 |
36
- | tensorboard | 2.18.0 |
37
-
38
- ## Dataset
39
 
40
  The test dataset used is [C4-Pro](https://huggingface.co/datasets/gair-prox/c4-pro), stored in parquet format after downloading.
41
 
42
- ## Usage
43
 
44
  Modify the path configuration in `run.sh`:
45
 
@@ -54,78 +59,47 @@ Then start training:
54
  bash run.sh
55
  ```
56
 
57
- By default, the script trains for 500 steps using 8 devices, DeepSpeed ZeRO-2, and bf16 precision.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
 
59
  ## Training Results Reference
60
 
61
- Below is the loss curve for the first 100 steps (learning rate warmup covers the first 50 steps):
62
-
63
- | Step | Loss | Learning Rate | Epoch |
64
- | --- | --- | --- | --- |
65
- | 2 | 2.7920 | 1.60e-06 | 0.01 |
66
- | 4 | 2.8012 | 3.20e-06 | 0.02 |
67
- | 6 | 2.7984 | 4.80e-06 | 0.03 |
68
- | 8 | 2.7839 | 6.40e-06 | 0.04 |
69
- | 10 | 2.8084 | 8.00e-06 | 0.05 |
70
- | 12 | 2.8064 | 9.60e-06 | 0.06 |
71
- | 14 | 2.7994 | 1.12e-05 | 0.07 |
72
- | 16 | 2.7463 | 1.28e-05 | 0.08 |
73
- | 18 | 2.7580 | 1.44e-05 | 0.09 |
74
- | 20 | 2.8007 | 1.60e-05 | 0.10 |
75
- | 22 | 2.8916 | 1.76e-05 | 0.12 |
76
- | 24 | 2.8144 | 1.92e-05 | 0.13 |
77
- | 26 | 2.7723 | 2.08e-05 | 0.14 |
78
- | 28 | 2.7556 | 2.24e-05 | 0.15 |
79
- | 30 | 2.7414 | 2.40e-05 | 0.16 |
80
- | 32 | 2.7469 | 2.56e-05 | 0.17 |
81
- | 34 | 2.7428 | 2.72e-05 | 0.18 |
82
- | 36 | 2.7392 | 2.88e-05 | 0.19 |
83
- | 38 | 2.7132 | 3.04e-05 | 0.20 |
84
- | 40 | 2.7008 | 3.20e-05 | 0.21 |
85
- | 42 | 2.7547 | 3.36e-05 | 0.22 |
86
- | 44 | 2.7151 | 3.52e-05 | 0.23 |
87
- | 46 | 2.7119 | 3.68e-05 | 0.24 |
88
- | 48 | 2.7029 | 3.84e-05 | 0.25 |
89
- | 50 | 2.6803 | 4.00e-05 | 0.26 |
90
- | 52 | 2.6980 | 4.00e-05 | 0.27 |
91
- | 54 | 2.6923 | 4.00e-05 | 0.28 |
92
- | 56 | 2.7068 | 4.00e-05 | 0.29 |
93
- | 58 | 2.6965 | 4.00e-05 | 0.30 |
94
- | 60 | 2.7179 | 3.99e-05 | 0.31 |
95
- | 62 | 2.7119 | 3.99e-05 | 0.32 |
96
- | 64 | 2.7178 | 3.99e-05 | 0.33 |
97
- | 66 | 2.7069 | 3.99e-05 | 0.35 |
98
- | 68 | 2.6870 | 3.98e-05 | 0.36 |
99
- | 70 | 2.6775 | 3.98e-05 | 0.37 |
100
- | 72 | 2.7038 | 3.98e-05 | 0.38 |
101
- | 74 | 2.6924 | 3.97e-05 | 0.39 |
102
- | 76 | 2.7061 | 3.97e-05 | 0.40 |
103
- | 78 | 2.6929 | 3.96e-05 | 0.41 |
104
- | 80 | 2.6787 | 3.96e-05 | 0.42 |
105
- | 82 | 2.6749 | 3.95e-05 | 0.43 |
106
- | 84 | 2.6909 | 3.94e-05 | 0.44 |
107
- | 86 | 2.6893 | 3.94e-05 | 0.45 |
108
- | 88 | 2.6788 | 3.93e-05 | 0.46 |
109
- | 90 | 2.6831 | 3.92e-05 | 0.47 |
110
- | 92 | 2.7039 | 3.91e-05 | 0.48 |
111
- | 94 | 2.6619 | 3.91e-05 | 0.49 |
112
- | 96 | 2.6903 | 3.90e-05 | 0.50 |
113
- | 98 | 2.6993 | 3.89e-05 | 0.51 |
114
- | 100 | 2.6891 | 3.88e-05 | 0.52 |
115
- | 102 | 2.6739 | 3.87e-05 | 0.53 |
116
-
117
- > **Note:** BitCPM has its own training dataset and data mixture. It is expected that the loss continues to decrease when continue pretraining on open-source datasets.
118
-
119
- As shown in the table, the loss gradually decreases from ~2.79 to ~2.67, indicating a stable training process and that the model is learning normally.
120
 
121
- ## File Description
122
 
123
- | File | Description |
 
 
 
 
 
 
 
124
  | --- | --- |
125
- | `train.py` | Training script based on HuggingFace Trainer + DeepSpeed |
126
- | `run.sh` | Launch script with training hyperparameter configuration |
127
- | `train_sft.py` | Supervised fine-tuning script based on HuggingFace Trainer + DeepSpeed |
128
- | `run_sft.sh` | Launch script for SFT with hyperparameter configuration |
129
- | `ds_config.json` | DeepSpeed ZeRO-3 configuration (with CPU offload) |
130
- | `ds_config_z2.json` | DeepSpeed ZeRO-2 configuration (used by default) |
131
- | `requirements.txt` | Python dependency list |
 
 
1
+ # BitCPM4 Training Example
2
 
3
+ This project provides scripts for continue pretraining (CPT) and supervised fine-tuning (SFT) of **BitCPM4-CANN-1B-unquantized**.
4
+
5
+ ## File Description
6
+
7
+ CPT and SFT each have a pair of scripts (training script + launch script) and share DeepSpeed configuration files:
8
+
9
+ | File | Description |
10
+ | --- | --- |
11
+ | `run.sh` | Launch script for CPT with hyperparameter configuration |
12
+ | `run_sft.sh` | Launch script for SFT with hyperparameter configuration |
13
+ | `train.py` | Continue pretrain script based on HuggingFace Trainer + DeepSpeed |
14
+ | `train_sft.py` | Supervised fine-tuning script based on HuggingFace Trainer + DeepSpeed |
15
+ | `ds_config.json` | DeepSpeed ZeRO-3 configuration (with CPU offload) |
16
+ | `ds_config_z2.json` | DeepSpeed ZeRO-2 configuration (used by default) |
17
+ | `requirements.txt` | Python dependency list |
18
 
19
  ## Environment Setup
20
 
21
  ### Docker Image
22
 
23
+ Use the following Huawei NPU image on 910C:
24
 
25
  ```
26
  swr.cn-south-1.myhuaweicloud.com/ascendhub/mindspeed-llm:openeuler22.03-mindspeed-llm-2.3.0-a3-arm
 
28
 
29
  Other Huawei NPU images may also work but have not been fully tested.
30
 
31
+ For GPU environments, there are no special image requirements — just install `requirements.txt` directly.
32
+
33
  ### Install Dependencies
34
 
35
  After entering the container, install the Python dependencies:
 
38
  pip install -r requirements.txt
39
  ```
40
 
41
+ ## Continue Pretrain (CPT)
42
 
43
+ ### Dataset
 
 
 
 
 
 
 
 
 
 
 
44
 
45
  The test dataset used is [C4-Pro](https://huggingface.co/datasets/gair-prox/c4-pro), stored in parquet format after downloading.
46
 
47
+ ### Usage
48
 
49
  Modify the path configuration in `run.sh`:
50
 
 
59
  bash run.sh
60
  ```
61
 
62
+ ## Supervised Fine-Tuning (SFT)
63
+
64
+ ### Dataset
65
+
66
+ The test dataset used is [UltraChat 200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k), stored in parquet format after downloading.
67
+
68
+ ### Usage
69
+
70
+ Modify the path configuration in `run_sft.sh`:
71
+
72
+ ```bash
73
+ MODEL_PATH="/path/to/BitCPM4-CANN-1B-unquantized/"
74
+ DATA_PATH="/path/to/ultrachat_200k/data/your_file.parquet"
75
+ ```
76
+
77
+ Then start training:
78
+
79
+ ```bash
80
+ bash run_sft.sh
81
+ ```
82
 
83
  ## Training Results Reference
84
 
85
+ > **Note:** BitCPM has its own training dataset and data mixture. It is expected that the loss continues to decrease when training on open-source datasets.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
 
87
+ Below are the loss curves from smoke tests on GPU and NPU for both CPT and SFT tasks. The results are highly consistent across GPU and NPU, indicating that users can continue pre-training or fine-tuning on various compute devices:
88
 
89
+ | | GPU | NPU |
90
+ | --- | --- | --- |
91
+ | **CPT** | ![GPU Pretrain Loss](gpu_pretrain_loss.png) | ![NPU Pretrain Loss](npu_pretrain_loss.png) |
92
+ | **SFT** | ![GPU SFT Loss](gpu_sft_loss.png) | ![NPU SFT Loss](npu_sft_loss.png) |
93
+
94
+ Training log CSV files (corresponding to the loss curves above):
95
+
96
+ | CSV File | Corresponding Loss Curve |
97
  | --- | --- |
98
+ | [gpu_pretrain.csv](gpu_pretrain.csv) | GPU CPT |
99
+ | [npu_pretrain.csv](npu_pretrain.csv) | NPU CPT |
100
+ | [gpu_sft.csv](gpu_sft.csv) | GPU SFT |
101
+ | [npu_sft.csv](npu_sft.csv) | NPU SFT |
102
+
103
+ ---
104
+
105
+ These scripts provide a convenient, ready-to-use toolkit for QAT-aware continued pre-training and fine-tuning of BitCPM4-CANN models, so you can quickly adapt the model to your own data and tasks while preserving ternary quantization constraints.
example/gpu_pretrain.csv ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ step,train/loss,train/grad_norm,train/learning_rate,train/epoch,train/train_runtime,train/train_samples_per_second,train/train_steps_per_second,train/total_flos,train/train_loss
2
+ 2,2.7920000553131104,0.03527498617768288,7.999999979801942e-06,0.010457516647875309,,,,,
3
+ 4,2.8011999130249023,0.03495891019701958,1.5999999959603883e-05,0.020915033295750618,,,,,
4
+ 6,2.7964000701904297,0.03271934762597084,2.4000000848900527e-05,0.0313725508749485,,,,,
5
+ 8,2.763700008392334,0.024968057870864868,3.199999991920777e-05,0.041830066591501236,,,,,
6
+ 10,3.281599998474121,0.31758183240890503,3.9999998989515007e-05,0.05228758230805397,,,,,
7
+ 12,2.941200017929077,0.044055406004190445,3.995128281530924e-05,0.062745101749897,,,,,
8
+ 14,2.851799964904785,0.03649706766009331,3.9805359847377986e-05,0.07320261746644974,,,,,
9
+ 16,2.7869999408721924,0.022624235600233078,3.9562950405525044e-05,0.08366013318300247,,,,,
10
+ 18,2.7825000286102295,0.021830420941114426,3.922523319488391e-05,0.0941176488995552,,,,,
11
+ 20,2.7857000827789307,0.01685911975800991,3.87938525818754e-05,0.10457516461610794,,,,,
12
+ 22,2.7571001052856445,0.01572061888873577,3.827090768027119e-05,0.11503268033266068,,,,,
13
+ 24,2.762399911880493,0.016891509294509888,3.7658952351193875e-05,0.125490203499794,,,,,
14
+ 26,2.7411000728607178,0.015683824196457863,3.6960962461307645e-05,0.13594771921634674,,,,,
15
+ 28,2.733099937438965,0.012847283855080605,3.6180339520797133e-05,0.14640523493289948,,,,,
16
+ 30,2.723400115966797,0.015209181234240532,3.532088885549456e-05,0.1568627506494522,,,,,
17
+ 32,2.7342000007629395,0.01241038367152214,3.4386797779006884e-05,0.16732026636600494,,,,,
18
+ 34,2.7321999073028564,0.012879018671810627,3.338261376484297e-05,0.17777778208255768,,,,,
19
+ 36,2.7314000129699707,0.013242729939520359,3.231322989449836e-05,0.1882352977991104,,,,,
20
+ 38,2.7065999507904053,0.01113435160368681,3.118385939160362e-05,0.19869281351566315,,,,,
21
+ 40,2.6958999633789062,0.012413726188242435,2.9999999242136255e-05,0.20915032923221588,,,,,
22
+ 42,2.7516000270843506,0.011661508120596409,2.8767422918463126e-05,0.21960784494876862,,,,,
23
+ 44,2.713099956512451,0.012248368933796883,2.749213126662653e-05,0.23006536066532135,,,,,
24
+ 46,2.7102999687194824,0.011450185440480709,2.6180339773418382e-05,0.24052287638187408,,,,,
25
+ 48,2.7021000385284424,0.011155751533806324,2.483843854861334e-05,0.250980406999588,,,,,
26
+ 50,2.680500030517578,0.010021247901022434,2.3472963221138343e-05,0.26143792271614075,,,,,
27
+ 52,2.699199914932251,0.010751751251518726,2.2090569473220967e-05,0.2718954384326935,,,,,
28
+ 54,2.694200038909912,0.010503941215574741,2.0697989384643734e-05,0.2823529541492462,,,,,
29
+ 56,2.7091000080108643,0.010059370659291744,1.9302009604871273e-05,0.29281046986579895,,,,,
30
+ 58,2.699399948120117,0.012161476537585258,1.7909431335283443e-05,0.3032679855823517,,,,,
31
+ 60,2.7216999530792236,0.010671027936041355,1.6527035768376663e-05,0.3137255012989044,,,,,
32
+ 62,2.7158000469207764,0.010463157668709755,1.516156225989107e-05,0.32418301701545715,,,,,
33
+ 64,2.7214999198913574,0.010665320791304111,1.3819660125591327e-05,0.3346405327320099,,,,,
34
+ 66,2.7116000652313232,0.01046629250049591,1.2507867722888477e-05,0.3450980484485626,,,,,
35
+ 68,2.6923000812530518,0.010609752498567104,1.1232576980546582e-05,0.35555556416511536,,,,,
36
+ 70,2.6830999851226807,0.009290814399719238,9.999999747378752e-06,0.3660130798816681,,,,,
37
+ 72,2.7093000411987305,0.010727670043706894,8.816142326395493e-06,0.3764705955982208,,,,,
38
+ 74,2.698699951171875,0.0109737953171134,7.686770914006047e-06,0.38692811131477356,,,,,
39
+ 76,2.712599992752075,0.010320967063307762,6.61738795315614e-06,0.3973856270313263,,,,,
40
+ 78,2.6993000507354736,0.009841523133218288,5.613203938992228e-06,0.40784314274787903,,,,,
41
+ 80,2.6861000061035156,0.010179675184190273,4.6791110435151495e-06,0.41830065846443176,,,,,
42
+ 82,2.6828999519348145,0.009790077805519104,3.819659923465224e-06,0.4287581741809845,,,,,
43
+ 84,2.699199914932251,0.010508442297577858,3.03903811982309e-06,0.43921568989753723,,,,,
44
+ 86,2.6988000869750977,0.009589221328496933,2.3410482299368596e-06,0.44967320561408997,,,,,
45
+ 88,2.688499927520752,0.010065913200378418,1.7290908544964623e-06,0.4601307213306427,,,,,
46
+ 90,2.6928999423980713,0.010363687761127949,1.206147544507985e-06,0.47058823704719543,,,,,
47
+ 92,2.714200019836426,0.010142815299332142,7.74766078848188e-07,0.48104575276374817,,,,,
48
+ 94,2.672300100326538,0.009833029471337795,4.370479871340649e-07,0.4915032684803009,,,,,
49
+ 96,2.7018001079559326,0.009937037713825703,1.9463863054625108e-07,0.501960813999176,,,,,
50
+ 98,2.7121999263763428,0.009417451918125153,4.8718995060426096e-08,0.5124183297157288,,,,,
51
+ 100,2.7028000354766846,0.009256146848201752,0.0,0.5228758454322815,365.8839111328125,139.93499755859375,0.27300000190734863,4.629706395531346e+17,2.7395541667938232
example/gpu_pretrain_loss.png ADDED
example/gpu_sft.csv ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ step,train/loss,train/grad_norm,train/learning_rate,train/epoch,train/train_runtime,train/train_samples_per_second,train/train_steps_per_second,train/total_flos,train/train_loss
2
+ 2,1.1492999792099,0.6216375231742859,1.9999999949504854e-06,0.0004617871018126607,,,,,
3
+ 4,1.0979000329971313,0.681877851486206,3.999999989900971e-06,0.0009235742036253214,,,,,
4
+ 6,1.1269999742507935,0.784303605556488,6.000000212225132e-06,0.001385361305437982,,,,,
5
+ 8,1.0542000532150269,0.8737029433250427,7.999999979801942e-06,0.0018471484072506428,,,,,
6
+ 10,1.2440999746322632,0.7068291902542114,9.999999747378752e-06,0.0023089356254786253,,,,,
7
+ 12,1.2925000190734863,0.6821666955947876,1.2000000424450263e-05,0.002770722610875964,,,,,
8
+ 14,1.0843000411987305,0.525643527507782,1.4000000192027073e-05,0.0032325098291039467,,,,,
9
+ 16,1.0961999893188477,0.43757057189941406,1.5999999959603883e-05,0.0036942968145012856,,,,,
10
+ 18,1.0614999532699585,0.46141618490219116,1.8000000636675395e-05,0.004156084265559912,,,,,
11
+ 20,1.332900047302246,0.715879499912262,1.9999999494757503e-05,0.004617871250957251,,,,,
12
+ 22,1.2070000171661377,0.5926885008811951,1.996917308133561e-05,0.0050796582363545895,,,,,
13
+ 24,1.2043999433517456,0.5833240747451782,1.9876883015967906e-05,0.005541445221751928,,,,,
14
+ 26,1.0740000009536743,0.44734400510787964,1.9723698642337695e-05,0.0060032326728105545,,,,,
15
+ 28,1.1162999868392944,0.3701137900352478,1.9510565834934823e-05,0.006465019658207893,,,,,
16
+ 30,1.0454000234603882,0.43832680583000183,1.9238796085119247e-05,0.006926806643605232,,,,,
17
+ 32,1.124899983406067,0.4591037631034851,1.8910064682131633e-05,0.007388593629002571,,,,,
18
+ 34,1.0686999559402466,0.3873400390148163,1.8526401618146338e-05,0.00785038061439991,,,,,
19
+ 36,1.0291999578475952,0.40313437581062317,1.8090169760398567e-05,0.008312168531119823,,,,,
20
+ 38,1.1052000522613525,0.3735405504703522,1.7604059394216165e-05,0.008773955516517162,,,,,
21
+ 40,1.1555999517440796,0.3818407654762268,1.7071068214136176e-05,0.009235742501914501,,,,,
22
+ 42,1.0235999822616577,0.4255191683769226,1.6494481315021403e-05,0.00969752948731184,,,,,
23
+ 44,1.0364999771118164,0.4794503152370453,1.5877853002166376e-05,0.010159316472709179,,,,,
24
+ 46,1.1344000101089478,0.37273937463760376,1.5224985872919206e-05,0.010621103458106518,,,,,
25
+ 48,1.0866999626159668,0.417492538690567,1.453990535082994e-05,0.011082890443503857,,,,,
26
+ 50,1.1038000583648682,0.35408055782318115,1.3826834219798911e-05,0.01154467836022377,,,,,
27
+ 52,1.1478999853134155,0.3930828273296356,1.3090169886709191e-05,0.012006465345621109,,,,,
28
+ 54,1.1858999729156494,0.3965947926044464,1.2334453458606731e-05,0.012468252331018448,,,,,
29
+ 56,1.0096999406814575,0.3860221207141876,1.1564344276848715e-05,0.012930039316415787,,,,,
30
+ 58,1.114799976348877,0.44393691420555115,1.0784590813273098e-05,0.013391826301813126,,,,,
31
+ 60,1.079300045967102,0.3605058789253235,9.999999747378752e-06,0.013853613287210464,,,,,
32
+ 62,1.1766999959945679,0.40689122676849365,9.215408681484405e-06,0.014315400272607803,,,,,
33
+ 64,1.1075999736785889,0.4002344310283661,8.435655217908788e-06,0.014777187258005142,,,,,
34
+ 66,1.1866999864578247,0.46947163343429565,7.665546036150772e-06,0.015238975174725056,,,,,
35
+ 68,1.0311000347137451,0.3296957314014435,6.909830062795663e-06,0.01570076122879982,,,,,
36
+ 70,1.1088999509811401,0.33858785033226013,6.173165729705943e-06,0.01616254821419716,,,,,
37
+ 72,1.0720000267028809,0.3967427909374237,5.460095053422265e-06,0.016624337062239647,,,,,
38
+ 74,1.1460000276565552,0.41202062368392944,4.7750145313329995e-06,0.017086124047636986,,,,,
39
+ 76,1.0425000190734863,0.38334518671035767,4.1221474020858295e-06,0.017547911033034325,,,,,
40
+ 78,0.9154000282287598,0.40649303793907166,3.505519543978153e-06,0.018009698018431664,,,,,
41
+ 80,1.1110999584197998,0.35371580719947815,2.9289321901160292e-06,0.018471485003829002,,,,,
42
+ 82,1.1672999858856201,0.3381657302379608,2.3959403279150138e-06,0.01893327198922634,,,,,
43
+ 84,1.2374000549316406,0.3815234303474426,1.909829961732612e-06,0.01939505897462368,,,,,
44
+ 86,1.2151000499725342,0.38446080684661865,1.4735983313585166e-06,0.01985684596002102,,,,,
45
+ 88,1.163100004196167,0.40419140458106995,1.0899348126258701e-06,0.020318632945418358,,,,,
46
+ 90,1.1883000135421753,0.4011874198913574,7.612046601934708e-07,0.020780419930815697,,,,,
47
+ 92,1.1526999473571777,0.3836020231246948,4.894348535344761e-07,0.021242206916213036,,,,,
48
+ 94,1.15339994430542,0.452364057302475,2.7630079557638965e-07,0.021703993901610374,,,,,
49
+ 96,1.062000036239624,0.3502688705921173,1.2311659247643547e-07,0.022165780887007713,,,,,
50
+ 98,1.0271999835968018,0.4022065997123718,3.0826662111849146e-08,0.022627567872405052,,,,,
51
+ 100,1.0283000469207764,0.38241174817085266,0.0,0.02308935672044754,183.9481964111328,8.697999954223633,0.5440000295639038,1862467846144.0,1.1177252531051636
example/gpu_sft_loss.png ADDED
example/npu_pretrain.csv ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ step,train/loss,train/grad_norm,train/learning_rate,train/epoch,train/train_runtime,train/train_samples_per_second,train/train_steps_per_second,train/total_flos,train/train_loss
2
+ 2,2.7920000553131104,0.035306449979543686,7.999999979801942e-06,0.010457516647875309,,,,,
3
+ 4,2.8011999130249023,0.03491510450839996,1.5999999959603883e-05,0.020915033295750618,,,,,
4
+ 6,2.7964000701904297,0.032717395573854446,2.4000000848900527e-05,0.0313725508749485,,,,,
5
+ 8,2.763700008392334,0.024953875690698624,3.199999991920777e-05,0.041830066591501236,,,,,
6
+ 10,3.2811999320983887,0.3170815408229828,3.9999998989515007e-05,0.05228758230805397,,,,,
7
+ 12,2.9409000873565674,0.04423849284648895,3.995128281530924e-05,0.062745101749897,,,,,
8
+ 14,2.851900100708008,0.03667925298213959,3.9805359847377986e-05,0.07320261746644974,,,,,
9
+ 16,2.7869999408721924,0.022814607247710228,3.9562950405525044e-05,0.08366013318300247,,,,,
10
+ 18,2.782599925994873,0.021528413519263268,3.922523319488391e-05,0.0941176488995552,,,,,
11
+ 20,2.785599946975708,0.017014438286423683,3.87938525818754e-05,0.10457516461610794,,,,,
12
+ 22,2.7571001052856445,0.015719758346676826,3.827090768027119e-05,0.11503268033266068,,,,,
13
+ 24,2.762399911880493,0.016948623582720757,3.7658952351193875e-05,0.125490203499794,,,,,
14
+ 26,2.7411000728607178,0.015535997226834297,3.6960962461307645e-05,0.13594771921634674,,,,,
15
+ 28,2.7330000400543213,0.012748735956847668,3.6180339520797133e-05,0.14640523493289948,,,,,
16
+ 30,2.723299980163574,0.014809778891503811,3.532088885549456e-05,0.1568627506494522,,,,,
17
+ 32,2.7342000007629395,0.01219236571341753,3.4386797779006884e-05,0.16732026636600494,,,,,
18
+ 34,2.7321999073028564,0.012785322032868862,3.338261376484297e-05,0.17777778208255768,,,,,
19
+ 36,2.7314000129699707,0.012986919842660427,3.231322989449836e-05,0.1882352977991104,,,,,
20
+ 38,2.7065999507904053,0.01096824835985899,3.118385939160362e-05,0.19869281351566315,,,,,
21
+ 40,2.6958999633789062,0.012387535534799099,2.9999999242136255e-05,0.20915032923221588,,,,,
22
+ 42,2.751499891281128,0.011586200445890427,2.8767422918463126e-05,0.21960784494876862,,,,,
23
+ 44,2.713099956512451,0.011821281164884567,2.749213126662653e-05,0.23006536066532135,,,,,
24
+ 46,2.7102999687194824,0.01147585827857256,2.6180339773418382e-05,0.24052287638187408,,,,,
25
+ 48,2.7019999027252197,0.011368263512849808,2.483843854861334e-05,0.250980406999588,,,,,
26
+ 50,2.680500030517578,0.009935515932738781,2.3472963221138343e-05,0.26143792271614075,,,,,
27
+ 52,2.6993000507354736,0.0109846917912364,2.2090569473220967e-05,0.2718954384326935,,,,,
28
+ 54,2.6940999031066895,0.010465175844728947,2.0697989384643734e-05,0.2823529541492462,,,,,
29
+ 56,2.7091000080108643,0.01009758748114109,1.9302009604871273e-05,0.29281046986579895,,,,,
30
+ 58,2.69950008392334,0.01249368954449892,1.7909431335283443e-05,0.3032679855823517,,,,,
31
+ 60,2.7216999530792236,0.01051376760005951,1.6527035768376663e-05,0.3137255012989044,,,,,
32
+ 62,2.7158000469207764,0.01054943073540926,1.516156225989107e-05,0.32418301701545715,,,,,
33
+ 64,2.7214999198913574,0.01076149195432663,1.3819660125591327e-05,0.3346405327320099,,,,,
34
+ 66,2.7116000652313232,0.010380392894148827,1.2507867722888477e-05,0.3450980484485626,,,,,
35
+ 68,2.6923000812530518,0.010425001382827759,1.1232576980546582e-05,0.35555556416511536,,,,,
36
+ 70,2.683199882507324,0.00925016961991787,9.999999747378752e-06,0.3660130798816681,,,,,
37
+ 72,2.7093000411987305,0.01072422880679369,8.816142326395493e-06,0.3764705955982208,,,,,
38
+ 74,2.6988000869750977,0.011063243262469769,7.686770914006047e-06,0.38692811131477356,,,,,
39
+ 76,2.7125000953674316,0.01013101264834404,6.61738795315614e-06,0.3973856270313263,,,,,
40
+ 78,2.6993000507354736,0.009940676391124725,5.613203938992228e-06,0.40784314274787903,,,,,
41
+ 80,2.6861000061035156,0.01050259917974472,4.6791110435151495e-06,0.41830065846443176,,,,,
42
+ 82,2.6828999519348145,0.009912634268403053,3.819659923465224e-06,0.4287581741809845,,,,,
43
+ 84,2.699199914932251,0.010668900795280933,3.03903811982309e-06,0.43921568989753723,,,,,
44
+ 86,2.698899984359741,0.009650414809584618,2.3410482299368596e-06,0.44967320561408997,,,,,
45
+ 88,2.6884000301361084,0.01006452739238739,1.7290908544964623e-06,0.4601307213306427,,,,,
46
+ 90,2.6928999423980713,0.010409764014184475,1.206147544507985e-06,0.47058823704719543,,,,,
47
+ 92,2.714200019836426,0.009937116876244545,7.74766078848188e-07,0.48104575276374817,,,,,
48
+ 94,2.672300100326538,0.009728306904435158,4.370479871340649e-07,0.4915032684803009,,,,,
49
+ 96,2.7018001079559326,0.010098566301167011,1.9463863054625108e-07,0.501960813999176,,,,,
50
+ 98,2.7123000621795654,0.009524320252239704,4.8718995060426096e-08,0.5124183297157288,,,,,
51
+ 100,2.7028000354766846,0.009290286339819431,0.0,0.5228758454322815,788.0635986328125,64.96900177001953,0.12700000405311584,4.629706395531346e+17,2.739542245864868
example/npu_pretrain_loss.png ADDED
example/npu_sft.csv ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ step,train/loss,train/grad_norm,train/learning_rate,train/epoch,train/train_runtime,train/train_samples_per_second,train/train_steps_per_second,train/total_flos,train/train_loss
2
+ 2,1.1491999626159668,0.6218180060386658,1.9999999949504854e-06,0.0004617871018126607,,,,,
3
+ 4,1.0981999635696411,0.6825665235519409,3.999999989900971e-06,0.0009235742036253214,,,,,
4
+ 6,1.1269999742507935,0.7838642001152039,6.000000212225132e-06,0.001385361305437982,,,,,
5
+ 8,1.0542000532150269,0.8744276762008667,7.999999979801942e-06,0.0018471484072506428,,,,,
6
+ 10,1.2441999912261963,0.7064258456230164,9.999999747378752e-06,0.0023089356254786253,,,,,
7
+ 12,1.2927000522613525,0.6829814910888672,1.2000000424450263e-05,0.002770722610875964,,,,,
8
+ 14,1.0844999551773071,0.5265647172927856,1.4000000192027073e-05,0.0032325098291039467,,,,,
9
+ 16,1.0963000059127808,0.4373657703399658,1.5999999959603883e-05,0.0036942968145012856,,,,,
10
+ 18,1.0615999698638916,0.46220508217811584,1.8000000636675395e-05,0.004156084265559912,,,,,
11
+ 20,1.3325999975204468,0.7157824039459229,1.9999999494757503e-05,0.004617871250957251,,,,,
12
+ 22,1.2070000171661377,0.5933427214622498,1.996917308133561e-05,0.0050796582363545895,,,,,
13
+ 24,1.2044999599456787,0.5816172957420349,1.9876883015967906e-05,0.005541445221751928,,,,,
14
+ 26,1.0740000009536743,0.4489712119102478,1.9723698642337695e-05,0.0060032326728105545,,,,,
15
+ 28,1.1164000034332275,0.3696516752243042,1.9510565834934823e-05,0.006465019658207893,,,,,
16
+ 30,1.045199990272522,0.4376335144042969,1.9238796085119247e-05,0.006926806643605232,,,,,
17
+ 32,1.1247999668121338,0.4589230716228485,1.8910064682131633e-05,0.007388593629002571,,,,,
18
+ 34,1.0688999891281128,0.3879022002220154,1.8526401618146338e-05,0.00785038061439991,,,,,
19
+ 36,1.0292999744415283,0.4027869403362274,1.8090169760398567e-05,0.008312168531119823,,,,,
20
+ 38,1.1052000522613525,0.37394437193870544,1.7604059394216165e-05,0.008773955516517162,,,,,
21
+ 40,1.1557999849319458,0.3808683753013611,1.7071068214136176e-05,0.009235742501914501,,,,,
22
+ 42,1.0232000350952148,0.4252733886241913,1.6494481315021403e-05,0.00969752948731184,,,,,
23
+ 44,1.0364999771118164,0.48068660497665405,1.5877853002166376e-05,0.010159316472709179,,,,,
24
+ 46,1.1340999603271484,0.37313926219940186,1.5224985872919206e-05,0.010621103458106518,,,,,
25
+ 48,1.0866999626159668,0.4175492823123932,1.453990535082994e-05,0.011082890443503857,,,,,
26
+ 50,1.1039999723434448,0.35443660616874695,1.3826834219798911e-05,0.01154467836022377,,,,,
27
+ 52,1.1480000019073486,0.39232146739959717,1.3090169886709191e-05,0.012006465345621109,,,,,
28
+ 54,1.1861000061035156,0.396918922662735,1.2334453458606731e-05,0.012468252331018448,,,,,
29
+ 56,1.0096999406814575,0.3885609209537506,1.1564344276848715e-05,0.012930039316415787,,,,,
30
+ 58,1.114799976348877,0.4421806335449219,1.0784590813273098e-05,0.013391826301813126,,,,,
31
+ 60,1.0795999765396118,0.36081990599632263,9.999999747378752e-06,0.013853613287210464,,,,,
32
+ 62,1.1764999628067017,0.4062329828739166,9.215408681484405e-06,0.014315400272607803,,,,,
33
+ 64,1.107200026512146,0.39982733130455017,8.435655217908788e-06,0.014777187258005142,,,,,
34
+ 66,1.1868000030517578,0.4688170254230499,7.665546036150772e-06,0.015238975174725056,,,,,
35
+ 68,1.0312999486923218,0.3301626741886139,6.909830062795663e-06,0.01570076122879982,,,,,
36
+ 70,1.1089999675750732,0.3377252221107483,6.173165729705943e-06,0.01616254821419716,,,,,
37
+ 72,1.0716999769210815,0.39666977524757385,5.460095053422265e-06,0.016624337062239647,,,,,
38
+ 74,1.1461999416351318,0.4125552177429199,4.7750145313329995e-06,0.017086124047636986,,,,,
39
+ 76,1.042199969291687,0.3825180232524872,4.1221474020858295e-06,0.017547911033034325,,,,,
40
+ 78,0.9157000184059143,0.4063441753387451,3.505519543978153e-06,0.018009698018431664,,,,,
41
+ 80,1.1110999584197998,0.35289037227630615,2.9289321901160292e-06,0.018471485003829002,,,,,
42
+ 82,1.167199969291687,0.33720290660858154,2.3959403279150138e-06,0.01893327198922634,,,,,
43
+ 84,1.2375999689102173,0.38099613785743713,1.909829961732612e-06,0.01939505897462368,,,,,
44
+ 86,1.2151999473571777,0.3848689794540405,1.4735983313585166e-06,0.01985684596002102,,,,,
45
+ 88,1.1628999710083008,0.40408074855804443,1.0899348126258701e-06,0.020318632945418358,,,,,
46
+ 90,1.1884000301361084,0.4015007019042969,7.612046601934708e-07,0.020780419930815697,,,,,
47
+ 92,1.152500033378601,0.38306349515914917,4.894348535344761e-07,0.021242206916213036,,,,,
48
+ 94,1.154099941253662,0.45273807644844055,2.7630079557638965e-07,0.021703993901610374,,,,,
49
+ 96,1.0618000030517578,0.35036078095436096,1.2311659247643547e-07,0.022165780887007713,,,,,
50
+ 98,1.0270999670028687,0.40208569169044495,3.0826662111849146e-08,0.022627567872405052,,,,,
51
+ 100,1.0285999774932861,0.38247284293174744,0.0,0.02308935672044754,728.7083129882812,2.196000099182129,0.13699999451637268,1862467846144.0,1.117748498916626
example/npu_sft_loss.png ADDED
example/run.sh CHANGED
@@ -1,6 +1,6 @@
1
  #!/bin/bash
2
 
3
- MODEL_PATH="/model/BitCPM/BitCPM4-CANN-1B-unquantized/"
4
  DATA_PATH="/dataset/c4-pro/data/000_1_7.parquet"
5
  OUTPUT_DIR="./output"
6
  DS_CONFIG="./ds_config_z2.json"
@@ -11,7 +11,8 @@ GRAD_ACCUM_STEPS=8
11
  MAX_SEQ_LENGTH=1024
12
 
13
  export ASCEND_RT_VISIBLE_DEVICES=8,9,10,11,12,13,14,15
14
-
 
15
  torchrun --nproc_per_node=$NUM_GPUS train.py \
16
  --model_name_or_path $MODEL_PATH \
17
  --data_path $DATA_PATH \
@@ -19,7 +20,7 @@ torchrun --nproc_per_node=$NUM_GPUS train.py \
19
  --output_dir $OUTPUT_DIR \
20
  --per_device_train_batch_size $BATCH_SIZE_PER_GPU \
21
  --gradient_accumulation_steps $GRAD_ACCUM_STEPS \
22
- --max_steps 500 \
23
  --learning_rate 4e-5 \
24
  --lr_scheduler_type cosine \
25
  --warmup_ratio 0.1 \
@@ -33,5 +34,5 @@ torchrun --nproc_per_node=$NUM_GPUS train.py \
33
  --seed 42 \
34
  --dataloader_num_workers 4 \
35
  --report_to tensorboard \
36
- --logging_dir /data/tensorboard/ \
37
  --gradient_checkpointing_kwargs '{"use_reentrant": false}'
 
1
  #!/bin/bash
2
 
3
+ MODEL_PATH="/model/BitCPM4-CANN-1B-unquantized"
4
  DATA_PATH="/dataset/c4-pro/data/000_1_7.parquet"
5
  OUTPUT_DIR="./output"
6
  DS_CONFIG="./ds_config_z2.json"
 
11
  MAX_SEQ_LENGTH=1024
12
 
13
  export ASCEND_RT_VISIBLE_DEVICES=8,9,10,11,12,13,14,15
14
+ export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
15
+ export DS_SKIP_CUDA_CHECK=1
16
  torchrun --nproc_per_node=$NUM_GPUS train.py \
17
  --model_name_or_path $MODEL_PATH \
18
  --data_path $DATA_PATH \
 
20
  --output_dir $OUTPUT_DIR \
21
  --per_device_train_batch_size $BATCH_SIZE_PER_GPU \
22
  --gradient_accumulation_steps $GRAD_ACCUM_STEPS \
23
+ --max_steps 100 \
24
  --learning_rate 4e-5 \
25
  --lr_scheduler_type cosine \
26
  --warmup_ratio 0.1 \
 
34
  --seed 42 \
35
  --dataloader_num_workers 4 \
36
  --report_to tensorboard \
37
+ --logging_dir /data/tensorboard/pretrain \
38
  --gradient_checkpointing_kwargs '{"use_reentrant": false}'
example/run_sft.sh CHANGED
@@ -1,16 +1,18 @@
1
  #!/bin/bash
2
 
3
- MODEL_PATH="/model/BitCPM/BitCPM4-CANN-3B-unquantized/"
4
- DATA_PATH=""
5
  OUTPUT_DIR="./output_sft"
6
  DS_CONFIG="./ds_config.json"
7
 
8
  NUM_GPUS=8
9
  BATCH_SIZE_PER_GPU=2
10
  GRAD_ACCUM_STEPS=1
11
- MAX_SEQ_LENGTH=4096
12
 
13
  export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
 
 
14
 
15
  torchrun --nproc_per_node=$NUM_GPUS train_sft.py \
16
  --model_name_or_path $MODEL_PATH \
@@ -19,10 +21,10 @@ torchrun --nproc_per_node=$NUM_GPUS train_sft.py \
19
  --output_dir $OUTPUT_DIR \
20
  --per_device_train_batch_size $BATCH_SIZE_PER_GPU \
21
  --gradient_accumulation_steps $GRAD_ACCUM_STEPS \
22
- --num_train_epochs 3 \
23
  --learning_rate 2e-5 \
24
  --lr_scheduler_type cosine \
25
- --warmup_ratio 0.03 \
26
  --weight_decay 0.0 \
27
  --logging_steps 2 \
28
  --save_steps 500 \
 
1
  #!/bin/bash
2
 
3
+ MODEL_PATH="/model/BitCPM4-CANN-1B-unquantized"
4
+ DATA_PATH="/dataset/HuggingFaceH4_ultrachat_200k/data/train_sft-00000-of-00003-a3ecf92756993583.parquet"
5
  OUTPUT_DIR="./output_sft"
6
  DS_CONFIG="./ds_config.json"
7
 
8
  NUM_GPUS=8
9
  BATCH_SIZE_PER_GPU=2
10
  GRAD_ACCUM_STEPS=1
11
+ MAX_SEQ_LENGTH=8192
12
 
13
  export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
14
+ export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
15
+ export DS_SKIP_CUDA_CHECK=1
16
 
17
  torchrun --nproc_per_node=$NUM_GPUS train_sft.py \
18
  --model_name_or_path $MODEL_PATH \
 
21
  --output_dir $OUTPUT_DIR \
22
  --per_device_train_batch_size $BATCH_SIZE_PER_GPU \
23
  --gradient_accumulation_steps $GRAD_ACCUM_STEPS \
24
+ --max_steps 100 \
25
  --learning_rate 2e-5 \
26
  --lr_scheduler_type cosine \
27
+ --warmup_ratio 0.2 \
28
  --weight_decay 0.0 \
29
  --logging_steps 2 \
30
  --save_steps 500 \