Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,4 +1,4 @@
-# Command-R 35B — CPT (Continual Pretraining)
 **Model type:** Causal Language Model
 **Base model:** [CohereLabs/c4ai-command-r-v01](https://huggingface.co/CohereLabs/c4ai-command-r-v01)
@@ -9,7 +9,8 @@
 ## Overview
-`commandr-CPT` is a **continual-pretrained** version of Cohere's Command-R 35B model, trained to further improve domain adaptation and general reasoning abilities.
 The continual pretraining was performed using Axolotl on the Leonardo EuroHPC system.
 ---
@@ -20,8 +21,9 @@ The continual pretraining was performed using Axolotl on the Leonardo EuroHPC sy
 **Adapter type:** LoRA
 **Precision:** bfloat16
 **Hardware:** 8 nodes × 2 × NVIDIA A100 64GB GPUs
-**Training duration:** 24 hours
 **Framework:** DeepSpeed ZeRO-1, Axolotl, PyTorch 2.5.1+cu121
 ---
@@ -43,13 +45,19 @@ Public energy domain text sources:
 | Sequence length | 2048 |
 | Micro batch size | 1 |
 | Gradient accumulation | 4 |
-| Learning rate | 2e-4 |
 | LR scheduler | cosine |
 | Optimizer | AdamW (8-bit) |
 | LoRA rank (r) | 16 |
 | LoRA alpha | 32 |
 | LoRA dropout | 0.05 |
-| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
-| Epochs | 1 |
-| Warmup steps | 10 |
-| Weight decay | 0.0 |

+# Command-R 35B — CPT (Continual Pretraining with LoRA)
 **Model type:** Causal Language Model
 **Base model:** [CohereLabs/c4ai-command-r-v01](https://huggingface.co/CohereLabs/c4ai-command-r-v01)
 ## Overview
+`commandr-CPT` is a **continual-pretrained** version of Cohere's Command-R 35B model, trained with LoRA adapters for efficient enregy doman adaptation.
+The goal of CPT is to extend the model’s general reasoning, factual grounding, and domain knowledge across science, governance, and energy-domain text.
 The continual pretraining was performed using Axolotl on the Leonardo EuroHPC system.
 ---
 **Adapter type:** LoRA
 **Precision:** bfloat16
 **Hardware:** 8 nodes × 2 × NVIDIA A100 64GB GPUs
 **Framework:** DeepSpeed ZeRO-1, Axolotl, PyTorch 2.5.1+cu121
+**Runtime:** ~24 hours
+**Checkpoints:** Saved every 1/5 of an epoch
 ---
 | Sequence length | 2048 |
 | Micro batch size | 1 |
 | Gradient accumulation | 4 |
+| Epochs | 1 |
+| Max steps | 10000 |
+| Learning rate | 0.0002 |
 | LR scheduler | cosine |
 | Optimizer | AdamW (8-bit) |
+| Warmup steps | 10 |
+| Weight decay | 0.0 |
 | LoRA rank (r) | 16 |
 | LoRA alpha | 32 |
 | LoRA dropout | 0.05 |
+| LoRA target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
+| Gradient checkpointing | ✅ |
+| Flash attention | ✅ |
+| Auto resume | ✅ |
+| Loss watchdog threshold | 5.0 |
+| Loss watchdog patience | 3 |