kosmylo1992 commited on
Commit
9f31690
·
verified ·
1 Parent(s): 3d41d05

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -3
README.md CHANGED
@@ -1,3 +1,55 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Command-R 35B — CPT (Continual Pretraining)
2
+
3
+ **Model type:** Causal Language Model
4
+ **Base model:** [CohereLabs/c4ai-command-r-v01](https://huggingface.co/CohereLabs/c4ai-command-r-v01)
5
+ **License:** Apache 2.0
6
+ **Framework:** [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
7
+
8
+ ---
9
+
10
+ ## Overview
11
+
12
+ `commandr-CPT` is a **continual-pretrained** version of Cohere's Command-R 35B model, trained to further improve domain adaptation and general reasoning abilities.
13
+ The continual pretraining was performed using Axolotl on the Leonardo EuroHPC system.
14
+
15
+ ---
16
+
17
+ ## Training Setup
18
+
19
+ **Objective:** Language modeling (unsupervised continual pretraining)
20
+ **Adapter type:** LoRA
21
+ **Precision:** bfloat16
22
+ **Hardware:** 8 nodes × 2 × NVIDIA A100 64GB GPUs
23
+ **Training duration:** 24 hours
24
+ **Framework:** DeepSpeed ZeRO-1, Axolotl, PyTorch 2.5.1+cu121
25
+
26
+ ---
27
+
28
+ ## Dataset
29
+
30
+ Public energy domain text sources:
31
+
32
+ - `arxiv.jsonl` — scientific and technical papers
33
+ - `gov.jsonl` — public governmental documents
34
+ - `news.jsonl` — news articles
35
+ - `wiki.jsonl` — Wikipedia text
36
+
37
+ ---
38
+
39
+ ## Hyperparameters
40
+
41
+ | Parameter | Value |
42
+ |------------|-------|
43
+ | Sequence length | 2048 |
44
+ | Micro batch size | 1 |
45
+ | Gradient accumulation | 4 |
46
+ | Learning rate | 2e-4 |
47
+ | LR scheduler | cosine |
48
+ | Optimizer | AdamW (8-bit) |
49
+ | LoRA rank (r) | 16 |
50
+ | LoRA alpha | 32 |
51
+ | LoRA dropout | 0.05 |
52
+ | Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
53
+ | Epochs | 1 |
54
+ | Warmup steps | 10 |
55
+ | Weight decay | 0.0 |