kosmylo1992 commited on
Commit
c682304
·
verified ·
1 Parent(s): 7925275

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -8
README.md CHANGED
@@ -1,4 +1,4 @@
1
- # Command-R 35B — CPT (Continual Pretraining)
2
 
3
  **Model type:** Causal Language Model
4
  **Base model:** [CohereLabs/c4ai-command-r-v01](https://huggingface.co/CohereLabs/c4ai-command-r-v01)
@@ -9,7 +9,8 @@
9
 
10
  ## Overview
11
 
12
- `commandr-CPT` is a **continual-pretrained** version of Cohere's Command-R 35B model, trained to further improve domain adaptation and general reasoning abilities.
 
13
  The continual pretraining was performed using Axolotl on the Leonardo EuroHPC system.
14
 
15
  ---
@@ -20,8 +21,9 @@ The continual pretraining was performed using Axolotl on the Leonardo EuroHPC sy
20
  **Adapter type:** LoRA
21
  **Precision:** bfloat16
22
  **Hardware:** 8 nodes × 2 × NVIDIA A100 64GB GPUs
23
- **Training duration:** 24 hours
24
  **Framework:** DeepSpeed ZeRO-1, Axolotl, PyTorch 2.5.1+cu121
 
 
25
 
26
  ---
27
 
@@ -43,13 +45,19 @@ Public energy domain text sources:
43
  | Sequence length | 2048 |
44
  | Micro batch size | 1 |
45
  | Gradient accumulation | 4 |
46
- | Learning rate | 2e-4 |
 
 
47
  | LR scheduler | cosine |
48
  | Optimizer | AdamW (8-bit) |
 
 
49
  | LoRA rank (r) | 16 |
50
  | LoRA alpha | 32 |
51
  | LoRA dropout | 0.05 |
52
- | Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
53
- | Epochs | 1 |
54
- | Warmup steps | 10 |
55
- | Weight decay | 0.0 |
 
 
 
1
+ # Command-R 35B — CPT (Continual Pretraining with LoRA)
2
 
3
  **Model type:** Causal Language Model
4
  **Base model:** [CohereLabs/c4ai-command-r-v01](https://huggingface.co/CohereLabs/c4ai-command-r-v01)
 
9
 
10
  ## Overview
11
 
12
+ `commandr-CPT` is a **continual-pretrained** version of Cohere's Command-R 35B model, trained with LoRA adapters for efficient enregy doman adaptation.
13
+ The goal of CPT is to extend the model’s general reasoning, factual grounding, and domain knowledge across science, governance, and energy-domain text.
14
  The continual pretraining was performed using Axolotl on the Leonardo EuroHPC system.
15
 
16
  ---
 
21
  **Adapter type:** LoRA
22
  **Precision:** bfloat16
23
  **Hardware:** 8 nodes × 2 × NVIDIA A100 64GB GPUs
 
24
  **Framework:** DeepSpeed ZeRO-1, Axolotl, PyTorch 2.5.1+cu121
25
+ **Runtime:** ~24 hours
26
+ **Checkpoints:** Saved every 1/5 of an epoch
27
 
28
  ---
29
 
 
45
  | Sequence length | 2048 |
46
  | Micro batch size | 1 |
47
  | Gradient accumulation | 4 |
48
+ | Epochs | 1 |
49
+ | Max steps | 10000 |
50
+ | Learning rate | 0.0002 |
51
  | LR scheduler | cosine |
52
  | Optimizer | AdamW (8-bit) |
53
+ | Warmup steps | 10 |
54
+ | Weight decay | 0.0 |
55
  | LoRA rank (r) | 16 |
56
  | LoRA alpha | 32 |
57
  | LoRA dropout | 0.05 |
58
+ | LoRA target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
59
+ | Gradient checkpointing | |
60
+ | Flash attention | |
61
+ | Auto resume | |
62
+ | Loss watchdog threshold | 5.0 |
63
+ | Loss watchdog patience | 3 |