kosmylo1992 commited on
Commit
9d6f532
·
verified ·
1 Parent(s): 0eead3d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -3
README.md CHANGED
@@ -1,3 +1,57 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Command-R 35B — CPT + SFT
2
+
3
+ **Model type:** Causal Language Model
4
+ **Base model:** [commandr-CPT](https://huggingface.co/kmylonas/commandr-CPT)
5
+ **License:** Apache 2.0
6
+ **Framework:** [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
7
+
8
+ ---
9
+
10
+ ## Overview
11
+
12
+ `commandr-CPT-SFT` combines both **continual pretraining (CPT)** and **supervised fine-tuning (SFT)** phases on top of Cohere’s Command-R 35B.
13
+ This model benefits from extended domain coverage (CPT) and enhanced instruction-following capabilities (SFT), resulting in a more stable and general-purpose conversational model.
14
+
15
+ ---
16
+
17
+ ## Training Setup
18
+
19
+ **Stage 1 (CPT):** Domain-adaptive continual pretraining
20
+ **Stage 2 (SFT):** Instruction fine-tuning
21
+ **Adapter type:** LoRA
22
+ **Precision:** bfloat16
23
+ **Hardware:** 8 nodes × 2 × NVIDIA A100 64GB GPUs
24
+ **Framework:** DeepSpeed ZeRO-1, Axolotl, PyTorch 2.5.1+cu121
25
+
26
+ ---
27
+
28
+ ## Datasets
29
+
30
+ **CPT Stage:**
31
+ - `arxiv.jsonl`
32
+ - `gov.jsonl`
33
+ - `news.jsonl`
34
+ - `wiki.jsonl`
35
+
36
+ **SFT Stage:**
37
+ - `axolotl_deduplicated_synthetic_qa.jsonl`
38
+
39
+ ---
40
+
41
+ ## Hyperparameters
42
+
43
+ | Parameter | Value |
44
+ |------------|-------|
45
+ | Sequence length | 2048 |
46
+ | Micro batch size | 2 |
47
+ | Gradient accumulation | 2 |
48
+ | Learning rate | 2e-4 |
49
+ | LR scheduler | cosine |
50
+ | Optimizer | AdamW (8-bit) |
51
+ | LoRA rank (r) | 16 |
52
+ | LoRA alpha | 32 |
53
+ | LoRA dropout | 0.05 |
54
+ | Target modules | q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj |
55
+ | Epochs | 1 |
56
+ | Warmup steps | 10 |
57
+ | Weight decay | 0.0 |