File size: 3,072 Bytes
9a25ac0
 
 
 
 
 
 
2d9c447
9a25ac0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c682304
9f31690
 
 
 
 
 
 
 
 
 
ec6b87f
c682304
ace4381
 
9f31690
 
 
 
 
 
 
 
 
2dbc2e4
c682304
 
9f31690
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c682304
 
 
9f31690
 
c682304
 
9f31690
 
 
c682304
 
 
 
 
5088d4f
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
{
  "language": ["en"],
  "license": "apache-2.0",
  "tags": [
    "text-generation",
    "causal-lm",
    "continual-pretraining",
    "lora",
    "axolotl",
    "deepspeed",
    "transformers",
    "commandr",
    "cohere",
    "eu-hpc"
  ],
  "datasets": [
    "arxiv",
    "gov",
    "news",
    "wikipedia"
  ],
  "metrics": [
    "loss"
  ],
  "library_name": "transformers",
  "framework": "pytorch",
  "base_model": "CohereLabs/c4ai-command-r-v01",
  "model_name": "commandr-35b-cpt",
  "pipeline_tag": "text-generation",
  "task_categories": ["text-generation"],
  "model_type": "AutoModelForCausalLM",
  "inference": {
    "parameters": {
      "max_new_tokens": 512,
      "temperature": 0.7,
      "top_p": 0.9
    }
  },
  "trained_on": [
    "Leonardo EuroHPC"
  ],
  "description": "Continual pretraining (CPT) of Cohere Command-R 35B using Axolotl and DeepSpeed ZeRO-1. The model was trained on scientific, governmental, news, and Wikipedia data with LoRA adapters to improve factual grounding and reasoning."
}
---

# Command-R 35B — CPT (Continual Pretraining with LoRA)

**Model type:** Causal Language Model  
**Base model:** [CohereLabs/c4ai-command-r-v01](https://huggingface.co/CohereLabs/c4ai-command-r-v01)  
**License:** Apache 2.0  
**Framework:** [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)

---

## Overview

`commandr-35b-cpt` is a **continual-pretrained** version of Cohere's Command-R 35B model, trained with LoRA adapters for efficient enregy doman adaptation. 
The goal of CPT is to extend the model’s general reasoning, factual grounding, and domain knowledge across science, governance, and energy-domain text.

Training was performed on the **Leonardo EuroHPC** system using Axolotl with DeepSpeed ZeRO-1 optimization.

---

## Training Setup

**Objective:** Language modeling (unsupervised continual pretraining)  
**Adapter type:** LoRA  
**Precision:** bfloat16  
**Hardware:** 8 nodes × 2 × NVIDIA A100 64GB GPUs  
**Framework:** DeepSpeed ZeRO-1, Axolotl, PyTorch 2.5.1+cu121    
**Runtime:** ~24 hours  
**Checkpoints:** Saved every 1/5 of an epoch

---

## Dataset

Public energy domain text sources:

- `arxiv.jsonl` — scientific and technical papers  
- `gov.jsonl` — public governmental documents  
- `news.jsonl` — news articles  
- `wiki.jsonl` — Wikipedia text

---

## Hyperparameters

| Parameter | Value |
|------------|-------|
| Sequence length | 2048 |
| Micro batch size | 1 |
| Gradient accumulation | 4 |
| Epochs | 1 |
| Max steps | 10000 |
| Learning rate | 0.0002 |
| LR scheduler | cosine |
| Optimizer | AdamW (8-bit) |
| Warmup steps | 10 |
| Weight decay | 0.0 |
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| LoRA target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Gradient checkpointing | ✅ |
| Flash attention | ✅ |
| Auto resume | ✅ |
| Loss watchdog threshold | 5.0 |
| Loss watchdog patience | 3 |


## Tokenizer

**Tokenizer type:** `AutoTokenizer`  
**Special token:** `<|end_of_text|>` as `pad_token`