Text Generation
Transformers
PyTorch
Chinese
English
llama
conversational
custom_code
text-generation-inference
Instructions to use openbmb/BitCPM-CANN-1B-unquantized with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openbmb/BitCPM-CANN-1B-unquantized with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="openbmb/BitCPM-CANN-1B-unquantized", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("openbmb/BitCPM-CANN-1B-unquantized", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("openbmb/BitCPM-CANN-1B-unquantized", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use openbmb/BitCPM-CANN-1B-unquantized with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "openbmb/BitCPM-CANN-1B-unquantized" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/BitCPM-CANN-1B-unquantized", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/openbmb/BitCPM-CANN-1B-unquantized
- SGLang
How to use openbmb/BitCPM-CANN-1B-unquantized with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "openbmb/BitCPM-CANN-1B-unquantized" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/BitCPM-CANN-1B-unquantized", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "openbmb/BitCPM-CANN-1B-unquantized" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/BitCPM-CANN-1B-unquantized", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use openbmb/BitCPM-CANN-1B-unquantized with Docker Model Runner:
docker model run hf.co/openbmb/BitCPM-CANN-1B-unquantized
Upload folder using huggingface_hub
Browse files- example/README.md +47 -66
- example/gpu_pretrain.csv +51 -0
- example/gpu_pretrain_loss.png +0 -0
- example/gpu_sft.csv +51 -0
- example/gpu_sft_loss.png +0 -0
- example/npu_pretrain.csv +51 -0
- example/npu_pretrain_loss.png +0 -0
- example/npu_sft.csv +51 -0
- example/npu_sft_loss.png +0 -0
- example/run.sh +5 -4
- example/run_sft.sh +7 -5
example/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
-
# BitCPM4
|
| 2 |
|
| 3 |
-
This project provides scripts for continue pretraining **BitCPM4-CANN-1B-unquantized**.
|
| 4 |
|
| 5 |
## Environment Setup
|
| 6 |
|
|
@@ -35,11 +35,13 @@ Dependency list:
|
|
| 35 |
| pyarrow | 17.0.0 |
|
| 36 |
| tensorboard | 2.18.0 |
|
| 37 |
|
| 38 |
-
##
|
|
|
|
|
|
|
| 39 |
|
| 40 |
The test dataset used is [C4-Pro](https://huggingface.co/datasets/gair-prox/c4-pro), stored in parquet format after downloading.
|
| 41 |
|
| 42 |
-
## Usage
|
| 43 |
|
| 44 |
Modify the path configuration in `run.sh`:
|
| 45 |
|
|
@@ -54,76 +56,55 @@ Then start training:
|
|
| 54 |
bash run.sh
|
| 55 |
```
|
| 56 |
|
| 57 |
-
By default, the script trains for
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
|
| 59 |
## Training Results Reference
|
| 60 |
|
| 61 |
-
Below
|
| 62 |
-
|
| 63 |
-
|
|
| 64 |
-
| --- | --- | --- |
|
| 65 |
-
|
|
| 66 |
-
|
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
| 24 | 2.8144 | 1.92e-05 | 0.13 |
|
| 77 |
-
| 26 | 2.7723 | 2.08e-05 | 0.14 |
|
| 78 |
-
| 28 | 2.7556 | 2.24e-05 | 0.15 |
|
| 79 |
-
| 30 | 2.7414 | 2.40e-05 | 0.16 |
|
| 80 |
-
| 32 | 2.7469 | 2.56e-05 | 0.17 |
|
| 81 |
-
| 34 | 2.7428 | 2.72e-05 | 0.18 |
|
| 82 |
-
| 36 | 2.7392 | 2.88e-05 | 0.19 |
|
| 83 |
-
| 38 | 2.7132 | 3.04e-05 | 0.20 |
|
| 84 |
-
| 40 | 2.7008 | 3.20e-05 | 0.21 |
|
| 85 |
-
| 42 | 2.7547 | 3.36e-05 | 0.22 |
|
| 86 |
-
| 44 | 2.7151 | 3.52e-05 | 0.23 |
|
| 87 |
-
| 46 | 2.7119 | 3.68e-05 | 0.24 |
|
| 88 |
-
| 48 | 2.7029 | 3.84e-05 | 0.25 |
|
| 89 |
-
| 50 | 2.6803 | 4.00e-05 | 0.26 |
|
| 90 |
-
| 52 | 2.6980 | 4.00e-05 | 0.27 |
|
| 91 |
-
| 54 | 2.6923 | 4.00e-05 | 0.28 |
|
| 92 |
-
| 56 | 2.7068 | 4.00e-05 | 0.29 |
|
| 93 |
-
| 58 | 2.6965 | 4.00e-05 | 0.30 |
|
| 94 |
-
| 60 | 2.7179 | 3.99e-05 | 0.31 |
|
| 95 |
-
| 62 | 2.7119 | 3.99e-05 | 0.32 |
|
| 96 |
-
| 64 | 2.7178 | 3.99e-05 | 0.33 |
|
| 97 |
-
| 66 | 2.7069 | 3.99e-05 | 0.35 |
|
| 98 |
-
| 68 | 2.6870 | 3.98e-05 | 0.36 |
|
| 99 |
-
| 70 | 2.6775 | 3.98e-05 | 0.37 |
|
| 100 |
-
| 72 | 2.7038 | 3.98e-05 | 0.38 |
|
| 101 |
-
| 74 | 2.6924 | 3.97e-05 | 0.39 |
|
| 102 |
-
| 76 | 2.7061 | 3.97e-05 | 0.40 |
|
| 103 |
-
| 78 | 2.6929 | 3.96e-05 | 0.41 |
|
| 104 |
-
| 80 | 2.6787 | 3.96e-05 | 0.42 |
|
| 105 |
-
| 82 | 2.6749 | 3.95e-05 | 0.43 |
|
| 106 |
-
| 84 | 2.6909 | 3.94e-05 | 0.44 |
|
| 107 |
-
| 86 | 2.6893 | 3.94e-05 | 0.45 |
|
| 108 |
-
| 88 | 2.6788 | 3.93e-05 | 0.46 |
|
| 109 |
-
| 90 | 2.6831 | 3.92e-05 | 0.47 |
|
| 110 |
-
| 92 | 2.7039 | 3.91e-05 | 0.48 |
|
| 111 |
-
| 94 | 2.6619 | 3.91e-05 | 0.49 |
|
| 112 |
-
| 96 | 2.6903 | 3.90e-05 | 0.50 |
|
| 113 |
-
| 98 | 2.6993 | 3.89e-05 | 0.51 |
|
| 114 |
-
| 100 | 2.6891 | 3.88e-05 | 0.52 |
|
| 115 |
-
| 102 | 2.6739 | 3.87e-05 | 0.53 |
|
| 116 |
-
|
| 117 |
-
> **Note:** BitCPM has its own training dataset and data mixture. It is expected that the loss continues to decrease when continue pretraining on open-source datasets.
|
| 118 |
-
|
| 119 |
-
As shown in the table, the loss gradually decreases from ~2.79 to ~2.67, indicating a stable training process and that the model is learning normally.
|
| 120 |
|
| 121 |
## File Description
|
| 122 |
|
| 123 |
| File | Description |
|
| 124 |
| --- | --- |
|
| 125 |
-
| `train.py` |
|
| 126 |
-
| `run.sh` | Launch script
|
| 127 |
| `train_sft.py` | Supervised fine-tuning script based on HuggingFace Trainer + DeepSpeed |
|
| 128 |
| `run_sft.sh` | Launch script for SFT with hyperparameter configuration |
|
| 129 |
| `ds_config.json` | DeepSpeed ZeRO-3 configuration (with CPU offload) |
|
|
|
|
| 1 |
+
# BitCPM4 Training Example
|
| 2 |
|
| 3 |
+
This project provides scripts for continue pretraining (CPT) and supervised fine-tuning (SFT) of **BitCPM4-CANN-1B-unquantized**.
|
| 4 |
|
| 5 |
## Environment Setup
|
| 6 |
|
|
|
|
| 35 |
| pyarrow | 17.0.0 |
|
| 36 |
| tensorboard | 2.18.0 |
|
| 37 |
|
| 38 |
+
## Continue Pretrain (CPT)
|
| 39 |
+
|
| 40 |
+
### Dataset
|
| 41 |
|
| 42 |
The test dataset used is [C4-Pro](https://huggingface.co/datasets/gair-prox/c4-pro), stored in parquet format after downloading.
|
| 43 |
|
| 44 |
+
### Usage
|
| 45 |
|
| 46 |
Modify the path configuration in `run.sh`:
|
| 47 |
|
|
|
|
| 56 |
bash run.sh
|
| 57 |
```
|
| 58 |
|
| 59 |
+
By default, the script trains for 100 steps using 8 devices, DeepSpeed ZeRO-2, and bf16 precision.
|
| 60 |
+
|
| 61 |
+
## Supervised Fine-Tuning (SFT)
|
| 62 |
+
|
| 63 |
+
### Dataset
|
| 64 |
+
|
| 65 |
+
The test dataset used is [UltraChat 200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k), stored in parquet format after downloading.
|
| 66 |
+
|
| 67 |
+
### Usage
|
| 68 |
+
|
| 69 |
+
Modify the path configuration in `run_sft.sh`:
|
| 70 |
+
|
| 71 |
+
```bash
|
| 72 |
+
MODEL_PATH="/path/to/BitCPM4-CANN-1B-unquantized/"
|
| 73 |
+
DATA_PATH="/path/to/ultrachat_200k/data/your_file.parquet"
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
Then start training:
|
| 77 |
+
|
| 78 |
+
```bash
|
| 79 |
+
bash run_sft.sh
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
By default, the script trains for 100 steps using 8 devices, DeepSpeed ZeRO-3 (with CPU offload), and bf16 precision. The maximum sequence length is 8192.
|
| 83 |
|
| 84 |
## Training Results Reference
|
| 85 |
|
| 86 |
+
Below are the loss curves from smoke tests on GPU and NPU for both CPT and SFT tasks:
|
| 87 |
+
|
| 88 |
+
| | GPU | NPU |
|
| 89 |
+
| --- | --- | --- |
|
| 90 |
+
| **CPT** |  |  |
|
| 91 |
+
| **SFT** |  |  |
|
| 92 |
+
|
| 93 |
+
Training log CSV files:
|
| 94 |
+
|
| 95 |
+
- [gpu_pretrain.csv](gpu_pretrain.csv)
|
| 96 |
+
- [npu_pretrain.csv](npu_pretrain.csv)
|
| 97 |
+
- [gpu_sft.csv](gpu_sft.csv)
|
| 98 |
+
- [npu_sft.csv](npu_sft.csv)
|
| 99 |
+
|
| 100 |
+
> **Note:** BitCPM has its own training dataset and data mixture. It is expected that the loss continues to decrease when training on open-source datasets.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 101 |
|
| 102 |
## File Description
|
| 103 |
|
| 104 |
| File | Description |
|
| 105 |
| --- | --- |
|
| 106 |
+
| `train.py` | Continue pretrain script based on HuggingFace Trainer + DeepSpeed |
|
| 107 |
+
| `run.sh` | Launch script for CPT with hyperparameter configuration |
|
| 108 |
| `train_sft.py` | Supervised fine-tuning script based on HuggingFace Trainer + DeepSpeed |
|
| 109 |
| `run_sft.sh` | Launch script for SFT with hyperparameter configuration |
|
| 110 |
| `ds_config.json` | DeepSpeed ZeRO-3 configuration (with CPU offload) |
|
example/gpu_pretrain.csv
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
step,train/loss,train/grad_norm,train/learning_rate,train/epoch,train/train_runtime,train/train_samples_per_second,train/train_steps_per_second,train/total_flos,train/train_loss
|
| 2 |
+
2,2.7920000553131104,0.03527498617768288,7.999999979801942e-06,0.010457516647875309,,,,,
|
| 3 |
+
4,2.8011999130249023,0.03495891019701958,1.5999999959603883e-05,0.020915033295750618,,,,,
|
| 4 |
+
6,2.7964000701904297,0.03271934762597084,2.4000000848900527e-05,0.0313725508749485,,,,,
|
| 5 |
+
8,2.763700008392334,0.024968057870864868,3.199999991920777e-05,0.041830066591501236,,,,,
|
| 6 |
+
10,3.281599998474121,0.31758183240890503,3.9999998989515007e-05,0.05228758230805397,,,,,
|
| 7 |
+
12,2.941200017929077,0.044055406004190445,3.995128281530924e-05,0.062745101749897,,,,,
|
| 8 |
+
14,2.851799964904785,0.03649706766009331,3.9805359847377986e-05,0.07320261746644974,,,,,
|
| 9 |
+
16,2.7869999408721924,0.022624235600233078,3.9562950405525044e-05,0.08366013318300247,,,,,
|
| 10 |
+
18,2.7825000286102295,0.021830420941114426,3.922523319488391e-05,0.0941176488995552,,,,,
|
| 11 |
+
20,2.7857000827789307,0.01685911975800991,3.87938525818754e-05,0.10457516461610794,,,,,
|
| 12 |
+
22,2.7571001052856445,0.01572061888873577,3.827090768027119e-05,0.11503268033266068,,,,,
|
| 13 |
+
24,2.762399911880493,0.016891509294509888,3.7658952351193875e-05,0.125490203499794,,,,,
|
| 14 |
+
26,2.7411000728607178,0.015683824196457863,3.6960962461307645e-05,0.13594771921634674,,,,,
|
| 15 |
+
28,2.733099937438965,0.012847283855080605,3.6180339520797133e-05,0.14640523493289948,,,,,
|
| 16 |
+
30,2.723400115966797,0.015209181234240532,3.532088885549456e-05,0.1568627506494522,,,,,
|
| 17 |
+
32,2.7342000007629395,0.01241038367152214,3.4386797779006884e-05,0.16732026636600494,,,,,
|
| 18 |
+
34,2.7321999073028564,0.012879018671810627,3.338261376484297e-05,0.17777778208255768,,,,,
|
| 19 |
+
36,2.7314000129699707,0.013242729939520359,3.231322989449836e-05,0.1882352977991104,,,,,
|
| 20 |
+
38,2.7065999507904053,0.01113435160368681,3.118385939160362e-05,0.19869281351566315,,,,,
|
| 21 |
+
40,2.6958999633789062,0.012413726188242435,2.9999999242136255e-05,0.20915032923221588,,,,,
|
| 22 |
+
42,2.7516000270843506,0.011661508120596409,2.8767422918463126e-05,0.21960784494876862,,,,,
|
| 23 |
+
44,2.713099956512451,0.012248368933796883,2.749213126662653e-05,0.23006536066532135,,,,,
|
| 24 |
+
46,2.7102999687194824,0.011450185440480709,2.6180339773418382e-05,0.24052287638187408,,,,,
|
| 25 |
+
48,2.7021000385284424,0.011155751533806324,2.483843854861334e-05,0.250980406999588,,,,,
|
| 26 |
+
50,2.680500030517578,0.010021247901022434,2.3472963221138343e-05,0.26143792271614075,,,,,
|
| 27 |
+
52,2.699199914932251,0.010751751251518726,2.2090569473220967e-05,0.2718954384326935,,,,,
|
| 28 |
+
54,2.694200038909912,0.010503941215574741,2.0697989384643734e-05,0.2823529541492462,,,,,
|
| 29 |
+
56,2.7091000080108643,0.010059370659291744,1.9302009604871273e-05,0.29281046986579895,,,,,
|
| 30 |
+
58,2.699399948120117,0.012161476537585258,1.7909431335283443e-05,0.3032679855823517,,,,,
|
| 31 |
+
60,2.7216999530792236,0.010671027936041355,1.6527035768376663e-05,0.3137255012989044,,,,,
|
| 32 |
+
62,2.7158000469207764,0.010463157668709755,1.516156225989107e-05,0.32418301701545715,,,,,
|
| 33 |
+
64,2.7214999198913574,0.010665320791304111,1.3819660125591327e-05,0.3346405327320099,,,,,
|
| 34 |
+
66,2.7116000652313232,0.01046629250049591,1.2507867722888477e-05,0.3450980484485626,,,,,
|
| 35 |
+
68,2.6923000812530518,0.010609752498567104,1.1232576980546582e-05,0.35555556416511536,,,,,
|
| 36 |
+
70,2.6830999851226807,0.009290814399719238,9.999999747378752e-06,0.3660130798816681,,,,,
|
| 37 |
+
72,2.7093000411987305,0.010727670043706894,8.816142326395493e-06,0.3764705955982208,,,,,
|
| 38 |
+
74,2.698699951171875,0.0109737953171134,7.686770914006047e-06,0.38692811131477356,,,,,
|
| 39 |
+
76,2.712599992752075,0.010320967063307762,6.61738795315614e-06,0.3973856270313263,,,,,
|
| 40 |
+
78,2.6993000507354736,0.009841523133218288,5.613203938992228e-06,0.40784314274787903,,,,,
|
| 41 |
+
80,2.6861000061035156,0.010179675184190273,4.6791110435151495e-06,0.41830065846443176,,,,,
|
| 42 |
+
82,2.6828999519348145,0.009790077805519104,3.819659923465224e-06,0.4287581741809845,,,,,
|
| 43 |
+
84,2.699199914932251,0.010508442297577858,3.03903811982309e-06,0.43921568989753723,,,,,
|
| 44 |
+
86,2.6988000869750977,0.009589221328496933,2.3410482299368596e-06,0.44967320561408997,,,,,
|
| 45 |
+
88,2.688499927520752,0.010065913200378418,1.7290908544964623e-06,0.4601307213306427,,,,,
|
| 46 |
+
90,2.6928999423980713,0.010363687761127949,1.206147544507985e-06,0.47058823704719543,,,,,
|
| 47 |
+
92,2.714200019836426,0.010142815299332142,7.74766078848188e-07,0.48104575276374817,,,,,
|
| 48 |
+
94,2.672300100326538,0.009833029471337795,4.370479871340649e-07,0.4915032684803009,,,,,
|
| 49 |
+
96,2.7018001079559326,0.009937037713825703,1.9463863054625108e-07,0.501960813999176,,,,,
|
| 50 |
+
98,2.7121999263763428,0.009417451918125153,4.8718995060426096e-08,0.5124183297157288,,,,,
|
| 51 |
+
100,2.7028000354766846,0.009256146848201752,0.0,0.5228758454322815,365.8839111328125,139.93499755859375,0.27300000190734863,4.629706395531346e+17,2.7395541667938232
|
example/gpu_pretrain_loss.png
ADDED
|
example/gpu_sft.csv
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
step,train/loss,train/grad_norm,train/learning_rate,train/epoch,train/train_runtime,train/train_samples_per_second,train/train_steps_per_second,train/total_flos,train/train_loss
|
| 2 |
+
2,1.1492999792099,0.6216375231742859,1.9999999949504854e-06,0.0004617871018126607,,,,,
|
| 3 |
+
4,1.0979000329971313,0.681877851486206,3.999999989900971e-06,0.0009235742036253214,,,,,
|
| 4 |
+
6,1.1269999742507935,0.784303605556488,6.000000212225132e-06,0.001385361305437982,,,,,
|
| 5 |
+
8,1.0542000532150269,0.8737029433250427,7.999999979801942e-06,0.0018471484072506428,,,,,
|
| 6 |
+
10,1.2440999746322632,0.7068291902542114,9.999999747378752e-06,0.0023089356254786253,,,,,
|
| 7 |
+
12,1.2925000190734863,0.6821666955947876,1.2000000424450263e-05,0.002770722610875964,,,,,
|
| 8 |
+
14,1.0843000411987305,0.525643527507782,1.4000000192027073e-05,0.0032325098291039467,,,,,
|
| 9 |
+
16,1.0961999893188477,0.43757057189941406,1.5999999959603883e-05,0.0036942968145012856,,,,,
|
| 10 |
+
18,1.0614999532699585,0.46141618490219116,1.8000000636675395e-05,0.004156084265559912,,,,,
|
| 11 |
+
20,1.332900047302246,0.715879499912262,1.9999999494757503e-05,0.004617871250957251,,,,,
|
| 12 |
+
22,1.2070000171661377,0.5926885008811951,1.996917308133561e-05,0.0050796582363545895,,,,,
|
| 13 |
+
24,1.2043999433517456,0.5833240747451782,1.9876883015967906e-05,0.005541445221751928,,,,,
|
| 14 |
+
26,1.0740000009536743,0.44734400510787964,1.9723698642337695e-05,0.0060032326728105545,,,,,
|
| 15 |
+
28,1.1162999868392944,0.3701137900352478,1.9510565834934823e-05,0.006465019658207893,,,,,
|
| 16 |
+
30,1.0454000234603882,0.43832680583000183,1.9238796085119247e-05,0.006926806643605232,,,,,
|
| 17 |
+
32,1.124899983406067,0.4591037631034851,1.8910064682131633e-05,0.007388593629002571,,,,,
|
| 18 |
+
34,1.0686999559402466,0.3873400390148163,1.8526401618146338e-05,0.00785038061439991,,,,,
|
| 19 |
+
36,1.0291999578475952,0.40313437581062317,1.8090169760398567e-05,0.008312168531119823,,,,,
|
| 20 |
+
38,1.1052000522613525,0.3735405504703522,1.7604059394216165e-05,0.008773955516517162,,,,,
|
| 21 |
+
40,1.1555999517440796,0.3818407654762268,1.7071068214136176e-05,0.009235742501914501,,,,,
|
| 22 |
+
42,1.0235999822616577,0.4255191683769226,1.6494481315021403e-05,0.00969752948731184,,,,,
|
| 23 |
+
44,1.0364999771118164,0.4794503152370453,1.5877853002166376e-05,0.010159316472709179,,,,,
|
| 24 |
+
46,1.1344000101089478,0.37273937463760376,1.5224985872919206e-05,0.010621103458106518,,,,,
|
| 25 |
+
48,1.0866999626159668,0.417492538690567,1.453990535082994e-05,0.011082890443503857,,,,,
|
| 26 |
+
50,1.1038000583648682,0.35408055782318115,1.3826834219798911e-05,0.01154467836022377,,,,,
|
| 27 |
+
52,1.1478999853134155,0.3930828273296356,1.3090169886709191e-05,0.012006465345621109,,,,,
|
| 28 |
+
54,1.1858999729156494,0.3965947926044464,1.2334453458606731e-05,0.012468252331018448,,,,,
|
| 29 |
+
56,1.0096999406814575,0.3860221207141876,1.1564344276848715e-05,0.012930039316415787,,,,,
|
| 30 |
+
58,1.114799976348877,0.44393691420555115,1.0784590813273098e-05,0.013391826301813126,,,,,
|
| 31 |
+
60,1.079300045967102,0.3605058789253235,9.999999747378752e-06,0.013853613287210464,,,,,
|
| 32 |
+
62,1.1766999959945679,0.40689122676849365,9.215408681484405e-06,0.014315400272607803,,,,,
|
| 33 |
+
64,1.1075999736785889,0.4002344310283661,8.435655217908788e-06,0.014777187258005142,,,,,
|
| 34 |
+
66,1.1866999864578247,0.46947163343429565,7.665546036150772e-06,0.015238975174725056,,,,,
|
| 35 |
+
68,1.0311000347137451,0.3296957314014435,6.909830062795663e-06,0.01570076122879982,,,,,
|
| 36 |
+
70,1.1088999509811401,0.33858785033226013,6.173165729705943e-06,0.01616254821419716,,,,,
|
| 37 |
+
72,1.0720000267028809,0.3967427909374237,5.460095053422265e-06,0.016624337062239647,,,,,
|
| 38 |
+
74,1.1460000276565552,0.41202062368392944,4.7750145313329995e-06,0.017086124047636986,,,,,
|
| 39 |
+
76,1.0425000190734863,0.38334518671035767,4.1221474020858295e-06,0.017547911033034325,,,,,
|
| 40 |
+
78,0.9154000282287598,0.40649303793907166,3.505519543978153e-06,0.018009698018431664,,,,,
|
| 41 |
+
80,1.1110999584197998,0.35371580719947815,2.9289321901160292e-06,0.018471485003829002,,,,,
|
| 42 |
+
82,1.1672999858856201,0.3381657302379608,2.3959403279150138e-06,0.01893327198922634,,,,,
|
| 43 |
+
84,1.2374000549316406,0.3815234303474426,1.909829961732612e-06,0.01939505897462368,,,,,
|
| 44 |
+
86,1.2151000499725342,0.38446080684661865,1.4735983313585166e-06,0.01985684596002102,,,,,
|
| 45 |
+
88,1.163100004196167,0.40419140458106995,1.0899348126258701e-06,0.020318632945418358,,,,,
|
| 46 |
+
90,1.1883000135421753,0.4011874198913574,7.612046601934708e-07,0.020780419930815697,,,,,
|
| 47 |
+
92,1.1526999473571777,0.3836020231246948,4.894348535344761e-07,0.021242206916213036,,,,,
|
| 48 |
+
94,1.15339994430542,0.452364057302475,2.7630079557638965e-07,0.021703993901610374,,,,,
|
| 49 |
+
96,1.062000036239624,0.3502688705921173,1.2311659247643547e-07,0.022165780887007713,,,,,
|
| 50 |
+
98,1.0271999835968018,0.4022065997123718,3.0826662111849146e-08,0.022627567872405052,,,,,
|
| 51 |
+
100,1.0283000469207764,0.38241174817085266,0.0,0.02308935672044754,183.9481964111328,8.697999954223633,0.5440000295639038,1862467846144.0,1.1177252531051636
|
example/gpu_sft_loss.png
ADDED
|
example/npu_pretrain.csv
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
step,train/loss,train/grad_norm,train/learning_rate,train/epoch,train/train_runtime,train/train_samples_per_second,train/train_steps_per_second,train/total_flos,train/train_loss
|
| 2 |
+
2,2.7920000553131104,0.035306449979543686,7.999999979801942e-06,0.010457516647875309,,,,,
|
| 3 |
+
4,2.8011999130249023,0.03491510450839996,1.5999999959603883e-05,0.020915033295750618,,,,,
|
| 4 |
+
6,2.7964000701904297,0.032717395573854446,2.4000000848900527e-05,0.0313725508749485,,,,,
|
| 5 |
+
8,2.763700008392334,0.024953875690698624,3.199999991920777e-05,0.041830066591501236,,,,,
|
| 6 |
+
10,3.2811999320983887,0.3170815408229828,3.9999998989515007e-05,0.05228758230805397,,,,,
|
| 7 |
+
12,2.9409000873565674,0.04423849284648895,3.995128281530924e-05,0.062745101749897,,,,,
|
| 8 |
+
14,2.851900100708008,0.03667925298213959,3.9805359847377986e-05,0.07320261746644974,,,,,
|
| 9 |
+
16,2.7869999408721924,0.022814607247710228,3.9562950405525044e-05,0.08366013318300247,,,,,
|
| 10 |
+
18,2.782599925994873,0.021528413519263268,3.922523319488391e-05,0.0941176488995552,,,,,
|
| 11 |
+
20,2.785599946975708,0.017014438286423683,3.87938525818754e-05,0.10457516461610794,,,,,
|
| 12 |
+
22,2.7571001052856445,0.015719758346676826,3.827090768027119e-05,0.11503268033266068,,,,,
|
| 13 |
+
24,2.762399911880493,0.016948623582720757,3.7658952351193875e-05,0.125490203499794,,,,,
|
| 14 |
+
26,2.7411000728607178,0.015535997226834297,3.6960962461307645e-05,0.13594771921634674,,,,,
|
| 15 |
+
28,2.7330000400543213,0.012748735956847668,3.6180339520797133e-05,0.14640523493289948,,,,,
|
| 16 |
+
30,2.723299980163574,0.014809778891503811,3.532088885549456e-05,0.1568627506494522,,,,,
|
| 17 |
+
32,2.7342000007629395,0.01219236571341753,3.4386797779006884e-05,0.16732026636600494,,,,,
|
| 18 |
+
34,2.7321999073028564,0.012785322032868862,3.338261376484297e-05,0.17777778208255768,,,,,
|
| 19 |
+
36,2.7314000129699707,0.012986919842660427,3.231322989449836e-05,0.1882352977991104,,,,,
|
| 20 |
+
38,2.7065999507904053,0.01096824835985899,3.118385939160362e-05,0.19869281351566315,,,,,
|
| 21 |
+
40,2.6958999633789062,0.012387535534799099,2.9999999242136255e-05,0.20915032923221588,,,,,
|
| 22 |
+
42,2.751499891281128,0.011586200445890427,2.8767422918463126e-05,0.21960784494876862,,,,,
|
| 23 |
+
44,2.713099956512451,0.011821281164884567,2.749213126662653e-05,0.23006536066532135,,,,,
|
| 24 |
+
46,2.7102999687194824,0.01147585827857256,2.6180339773418382e-05,0.24052287638187408,,,,,
|
| 25 |
+
48,2.7019999027252197,0.011368263512849808,2.483843854861334e-05,0.250980406999588,,,,,
|
| 26 |
+
50,2.680500030517578,0.009935515932738781,2.3472963221138343e-05,0.26143792271614075,,,,,
|
| 27 |
+
52,2.6993000507354736,0.0109846917912364,2.2090569473220967e-05,0.2718954384326935,,,,,
|
| 28 |
+
54,2.6940999031066895,0.010465175844728947,2.0697989384643734e-05,0.2823529541492462,,,,,
|
| 29 |
+
56,2.7091000080108643,0.01009758748114109,1.9302009604871273e-05,0.29281046986579895,,,,,
|
| 30 |
+
58,2.69950008392334,0.01249368954449892,1.7909431335283443e-05,0.3032679855823517,,,,,
|
| 31 |
+
60,2.7216999530792236,0.01051376760005951,1.6527035768376663e-05,0.3137255012989044,,,,,
|
| 32 |
+
62,2.7158000469207764,0.01054943073540926,1.516156225989107e-05,0.32418301701545715,,,,,
|
| 33 |
+
64,2.7214999198913574,0.01076149195432663,1.3819660125591327e-05,0.3346405327320099,,,,,
|
| 34 |
+
66,2.7116000652313232,0.010380392894148827,1.2507867722888477e-05,0.3450980484485626,,,,,
|
| 35 |
+
68,2.6923000812530518,0.010425001382827759,1.1232576980546582e-05,0.35555556416511536,,,,,
|
| 36 |
+
70,2.683199882507324,0.00925016961991787,9.999999747378752e-06,0.3660130798816681,,,,,
|
| 37 |
+
72,2.7093000411987305,0.01072422880679369,8.816142326395493e-06,0.3764705955982208,,,,,
|
| 38 |
+
74,2.6988000869750977,0.011063243262469769,7.686770914006047e-06,0.38692811131477356,,,,,
|
| 39 |
+
76,2.7125000953674316,0.01013101264834404,6.61738795315614e-06,0.3973856270313263,,,,,
|
| 40 |
+
78,2.6993000507354736,0.009940676391124725,5.613203938992228e-06,0.40784314274787903,,,,,
|
| 41 |
+
80,2.6861000061035156,0.01050259917974472,4.6791110435151495e-06,0.41830065846443176,,,,,
|
| 42 |
+
82,2.6828999519348145,0.009912634268403053,3.819659923465224e-06,0.4287581741809845,,,,,
|
| 43 |
+
84,2.699199914932251,0.010668900795280933,3.03903811982309e-06,0.43921568989753723,,,,,
|
| 44 |
+
86,2.698899984359741,0.009650414809584618,2.3410482299368596e-06,0.44967320561408997,,,,,
|
| 45 |
+
88,2.6884000301361084,0.01006452739238739,1.7290908544964623e-06,0.4601307213306427,,,,,
|
| 46 |
+
90,2.6928999423980713,0.010409764014184475,1.206147544507985e-06,0.47058823704719543,,,,,
|
| 47 |
+
92,2.714200019836426,0.009937116876244545,7.74766078848188e-07,0.48104575276374817,,,,,
|
| 48 |
+
94,2.672300100326538,0.009728306904435158,4.370479871340649e-07,0.4915032684803009,,,,,
|
| 49 |
+
96,2.7018001079559326,0.010098566301167011,1.9463863054625108e-07,0.501960813999176,,,,,
|
| 50 |
+
98,2.7123000621795654,0.009524320252239704,4.8718995060426096e-08,0.5124183297157288,,,,,
|
| 51 |
+
100,2.7028000354766846,0.009290286339819431,0.0,0.5228758454322815,788.0635986328125,64.96900177001953,0.12700000405311584,4.629706395531346e+17,2.739542245864868
|
example/npu_pretrain_loss.png
ADDED
|
example/npu_sft.csv
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
step,train/loss,train/grad_norm,train/learning_rate,train/epoch,train/train_runtime,train/train_samples_per_second,train/train_steps_per_second,train/total_flos,train/train_loss
|
| 2 |
+
2,1.1491999626159668,0.6218180060386658,1.9999999949504854e-06,0.0004617871018126607,,,,,
|
| 3 |
+
4,1.0981999635696411,0.6825665235519409,3.999999989900971e-06,0.0009235742036253214,,,,,
|
| 4 |
+
6,1.1269999742507935,0.7838642001152039,6.000000212225132e-06,0.001385361305437982,,,,,
|
| 5 |
+
8,1.0542000532150269,0.8744276762008667,7.999999979801942e-06,0.0018471484072506428,,,,,
|
| 6 |
+
10,1.2441999912261963,0.7064258456230164,9.999999747378752e-06,0.0023089356254786253,,,,,
|
| 7 |
+
12,1.2927000522613525,0.6829814910888672,1.2000000424450263e-05,0.002770722610875964,,,,,
|
| 8 |
+
14,1.0844999551773071,0.5265647172927856,1.4000000192027073e-05,0.0032325098291039467,,,,,
|
| 9 |
+
16,1.0963000059127808,0.4373657703399658,1.5999999959603883e-05,0.0036942968145012856,,,,,
|
| 10 |
+
18,1.0615999698638916,0.46220508217811584,1.8000000636675395e-05,0.004156084265559912,,,,,
|
| 11 |
+
20,1.3325999975204468,0.7157824039459229,1.9999999494757503e-05,0.004617871250957251,,,,,
|
| 12 |
+
22,1.2070000171661377,0.5933427214622498,1.996917308133561e-05,0.0050796582363545895,,,,,
|
| 13 |
+
24,1.2044999599456787,0.5816172957420349,1.9876883015967906e-05,0.005541445221751928,,,,,
|
| 14 |
+
26,1.0740000009536743,0.4489712119102478,1.9723698642337695e-05,0.0060032326728105545,,,,,
|
| 15 |
+
28,1.1164000034332275,0.3696516752243042,1.9510565834934823e-05,0.006465019658207893,,,,,
|
| 16 |
+
30,1.045199990272522,0.4376335144042969,1.9238796085119247e-05,0.006926806643605232,,,,,
|
| 17 |
+
32,1.1247999668121338,0.4589230716228485,1.8910064682131633e-05,0.007388593629002571,,,,,
|
| 18 |
+
34,1.0688999891281128,0.3879022002220154,1.8526401618146338e-05,0.00785038061439991,,,,,
|
| 19 |
+
36,1.0292999744415283,0.4027869403362274,1.8090169760398567e-05,0.008312168531119823,,,,,
|
| 20 |
+
38,1.1052000522613525,0.37394437193870544,1.7604059394216165e-05,0.008773955516517162,,,,,
|
| 21 |
+
40,1.1557999849319458,0.3808683753013611,1.7071068214136176e-05,0.009235742501914501,,,,,
|
| 22 |
+
42,1.0232000350952148,0.4252733886241913,1.6494481315021403e-05,0.00969752948731184,,,,,
|
| 23 |
+
44,1.0364999771118164,0.48068660497665405,1.5877853002166376e-05,0.010159316472709179,,,,,
|
| 24 |
+
46,1.1340999603271484,0.37313926219940186,1.5224985872919206e-05,0.010621103458106518,,,,,
|
| 25 |
+
48,1.0866999626159668,0.4175492823123932,1.453990535082994e-05,0.011082890443503857,,,,,
|
| 26 |
+
50,1.1039999723434448,0.35443660616874695,1.3826834219798911e-05,0.01154467836022377,,,,,
|
| 27 |
+
52,1.1480000019073486,0.39232146739959717,1.3090169886709191e-05,0.012006465345621109,,,,,
|
| 28 |
+
54,1.1861000061035156,0.396918922662735,1.2334453458606731e-05,0.012468252331018448,,,,,
|
| 29 |
+
56,1.0096999406814575,0.3885609209537506,1.1564344276848715e-05,0.012930039316415787,,,,,
|
| 30 |
+
58,1.114799976348877,0.4421806335449219,1.0784590813273098e-05,0.013391826301813126,,,,,
|
| 31 |
+
60,1.0795999765396118,0.36081990599632263,9.999999747378752e-06,0.013853613287210464,,,,,
|
| 32 |
+
62,1.1764999628067017,0.4062329828739166,9.215408681484405e-06,0.014315400272607803,,,,,
|
| 33 |
+
64,1.107200026512146,0.39982733130455017,8.435655217908788e-06,0.014777187258005142,,,,,
|
| 34 |
+
66,1.1868000030517578,0.4688170254230499,7.665546036150772e-06,0.015238975174725056,,,,,
|
| 35 |
+
68,1.0312999486923218,0.3301626741886139,6.909830062795663e-06,0.01570076122879982,,,,,
|
| 36 |
+
70,1.1089999675750732,0.3377252221107483,6.173165729705943e-06,0.01616254821419716,,,,,
|
| 37 |
+
72,1.0716999769210815,0.39666977524757385,5.460095053422265e-06,0.016624337062239647,,,,,
|
| 38 |
+
74,1.1461999416351318,0.4125552177429199,4.7750145313329995e-06,0.017086124047636986,,,,,
|
| 39 |
+
76,1.042199969291687,0.3825180232524872,4.1221474020858295e-06,0.017547911033034325,,,,,
|
| 40 |
+
78,0.9157000184059143,0.4063441753387451,3.505519543978153e-06,0.018009698018431664,,,,,
|
| 41 |
+
80,1.1110999584197998,0.35289037227630615,2.9289321901160292e-06,0.018471485003829002,,,,,
|
| 42 |
+
82,1.167199969291687,0.33720290660858154,2.3959403279150138e-06,0.01893327198922634,,,,,
|
| 43 |
+
84,1.2375999689102173,0.38099613785743713,1.909829961732612e-06,0.01939505897462368,,,,,
|
| 44 |
+
86,1.2151999473571777,0.3848689794540405,1.4735983313585166e-06,0.01985684596002102,,,,,
|
| 45 |
+
88,1.1628999710083008,0.40408074855804443,1.0899348126258701e-06,0.020318632945418358,,,,,
|
| 46 |
+
90,1.1884000301361084,0.4015007019042969,7.612046601934708e-07,0.020780419930815697,,,,,
|
| 47 |
+
92,1.152500033378601,0.38306349515914917,4.894348535344761e-07,0.021242206916213036,,,,,
|
| 48 |
+
94,1.154099941253662,0.45273807644844055,2.7630079557638965e-07,0.021703993901610374,,,,,
|
| 49 |
+
96,1.0618000030517578,0.35036078095436096,1.2311659247643547e-07,0.022165780887007713,,,,,
|
| 50 |
+
98,1.0270999670028687,0.40208569169044495,3.0826662111849146e-08,0.022627567872405052,,,,,
|
| 51 |
+
100,1.0285999774932861,0.38247284293174744,0.0,0.02308935672044754,728.7083129882812,2.196000099182129,0.13699999451637268,1862467846144.0,1.117748498916626
|
example/npu_sft_loss.png
ADDED
|
example/run.sh
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
#!/bin/bash
|
| 2 |
|
| 3 |
-
MODEL_PATH="/model/
|
| 4 |
DATA_PATH="/dataset/c4-pro/data/000_1_7.parquet"
|
| 5 |
OUTPUT_DIR="./output"
|
| 6 |
DS_CONFIG="./ds_config_z2.json"
|
|
@@ -11,7 +11,8 @@ GRAD_ACCUM_STEPS=8
|
|
| 11 |
MAX_SEQ_LENGTH=1024
|
| 12 |
|
| 13 |
export ASCEND_RT_VISIBLE_DEVICES=8,9,10,11,12,13,14,15
|
| 14 |
-
|
|
|
|
| 15 |
torchrun --nproc_per_node=$NUM_GPUS train.py \
|
| 16 |
--model_name_or_path $MODEL_PATH \
|
| 17 |
--data_path $DATA_PATH \
|
|
@@ -19,7 +20,7 @@ torchrun --nproc_per_node=$NUM_GPUS train.py \
|
|
| 19 |
--output_dir $OUTPUT_DIR \
|
| 20 |
--per_device_train_batch_size $BATCH_SIZE_PER_GPU \
|
| 21 |
--gradient_accumulation_steps $GRAD_ACCUM_STEPS \
|
| 22 |
-
--max_steps
|
| 23 |
--learning_rate 4e-5 \
|
| 24 |
--lr_scheduler_type cosine \
|
| 25 |
--warmup_ratio 0.1 \
|
|
@@ -33,5 +34,5 @@ torchrun --nproc_per_node=$NUM_GPUS train.py \
|
|
| 33 |
--seed 42 \
|
| 34 |
--dataloader_num_workers 4 \
|
| 35 |
--report_to tensorboard \
|
| 36 |
-
--logging_dir /data/tensorboard/ \
|
| 37 |
--gradient_checkpointing_kwargs '{"use_reentrant": false}'
|
|
|
|
| 1 |
#!/bin/bash
|
| 2 |
|
| 3 |
+
MODEL_PATH="/model/BitCPM4-CANN-1B-unquantized"
|
| 4 |
DATA_PATH="/dataset/c4-pro/data/000_1_7.parquet"
|
| 5 |
OUTPUT_DIR="./output"
|
| 6 |
DS_CONFIG="./ds_config_z2.json"
|
|
|
|
| 11 |
MAX_SEQ_LENGTH=1024
|
| 12 |
|
| 13 |
export ASCEND_RT_VISIBLE_DEVICES=8,9,10,11,12,13,14,15
|
| 14 |
+
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
|
| 15 |
+
export DS_SKIP_CUDA_CHECK=1
|
| 16 |
torchrun --nproc_per_node=$NUM_GPUS train.py \
|
| 17 |
--model_name_or_path $MODEL_PATH \
|
| 18 |
--data_path $DATA_PATH \
|
|
|
|
| 20 |
--output_dir $OUTPUT_DIR \
|
| 21 |
--per_device_train_batch_size $BATCH_SIZE_PER_GPU \
|
| 22 |
--gradient_accumulation_steps $GRAD_ACCUM_STEPS \
|
| 23 |
+
--max_steps 100 \
|
| 24 |
--learning_rate 4e-5 \
|
| 25 |
--lr_scheduler_type cosine \
|
| 26 |
--warmup_ratio 0.1 \
|
|
|
|
| 34 |
--seed 42 \
|
| 35 |
--dataloader_num_workers 4 \
|
| 36 |
--report_to tensorboard \
|
| 37 |
+
--logging_dir /data/tensorboard/pretrain \
|
| 38 |
--gradient_checkpointing_kwargs '{"use_reentrant": false}'
|
example/run_sft.sh
CHANGED
|
@@ -1,16 +1,18 @@
|
|
| 1 |
#!/bin/bash
|
| 2 |
|
| 3 |
-
MODEL_PATH="/model/
|
| 4 |
-
DATA_PATH=""
|
| 5 |
OUTPUT_DIR="./output_sft"
|
| 6 |
DS_CONFIG="./ds_config.json"
|
| 7 |
|
| 8 |
NUM_GPUS=8
|
| 9 |
BATCH_SIZE_PER_GPU=2
|
| 10 |
GRAD_ACCUM_STEPS=1
|
| 11 |
-
MAX_SEQ_LENGTH=
|
| 12 |
|
| 13 |
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
|
|
|
|
|
|
|
| 14 |
|
| 15 |
torchrun --nproc_per_node=$NUM_GPUS train_sft.py \
|
| 16 |
--model_name_or_path $MODEL_PATH \
|
|
@@ -19,10 +21,10 @@ torchrun --nproc_per_node=$NUM_GPUS train_sft.py \
|
|
| 19 |
--output_dir $OUTPUT_DIR \
|
| 20 |
--per_device_train_batch_size $BATCH_SIZE_PER_GPU \
|
| 21 |
--gradient_accumulation_steps $GRAD_ACCUM_STEPS \
|
| 22 |
-
--
|
| 23 |
--learning_rate 2e-5 \
|
| 24 |
--lr_scheduler_type cosine \
|
| 25 |
-
--warmup_ratio 0.
|
| 26 |
--weight_decay 0.0 \
|
| 27 |
--logging_steps 2 \
|
| 28 |
--save_steps 500 \
|
|
|
|
| 1 |
#!/bin/bash
|
| 2 |
|
| 3 |
+
MODEL_PATH="/model/BitCPM4-CANN-1B-unquantized"
|
| 4 |
+
DATA_PATH="/dataset/HuggingFaceH4_ultrachat_200k/data/train_sft-00000-of-00003-a3ecf92756993583.parquet"
|
| 5 |
OUTPUT_DIR="./output_sft"
|
| 6 |
DS_CONFIG="./ds_config.json"
|
| 7 |
|
| 8 |
NUM_GPUS=8
|
| 9 |
BATCH_SIZE_PER_GPU=2
|
| 10 |
GRAD_ACCUM_STEPS=1
|
| 11 |
+
MAX_SEQ_LENGTH=8192
|
| 12 |
|
| 13 |
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
|
| 14 |
+
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
|
| 15 |
+
export DS_SKIP_CUDA_CHECK=1
|
| 16 |
|
| 17 |
torchrun --nproc_per_node=$NUM_GPUS train_sft.py \
|
| 18 |
--model_name_or_path $MODEL_PATH \
|
|
|
|
| 21 |
--output_dir $OUTPUT_DIR \
|
| 22 |
--per_device_train_batch_size $BATCH_SIZE_PER_GPU \
|
| 23 |
--gradient_accumulation_steps $GRAD_ACCUM_STEPS \
|
| 24 |
+
--max_steps 100 \
|
| 25 |
--learning_rate 2e-5 \
|
| 26 |
--lr_scheduler_type cosine \
|
| 27 |
+
--warmup_ratio 0.2 \
|
| 28 |
--weight_decay 0.0 \
|
| 29 |
--logging_steps 2 \
|
| 30 |
--save_steps 500 \
|