Update README.md

Browse files

Files changed (1) hide show

README.md +55 -3

README.md CHANGED Viewed

@@ -1,3 +1,55 @@
----
-license: apache-2.0
----

+# Command-R 35B — CPT (Continual Pretraining)
+**Model type:** Causal Language Model
+**Base model:** [CohereLabs/c4ai-command-r-v01](https://huggingface.co/CohereLabs/c4ai-command-r-v01)
+**License:** Apache 2.0
+**Framework:** [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
+---
+## Overview
+`commandr-CPT` is a **continual-pretrained** version of Cohere's Command-R 35B model, trained to further improve domain adaptation and general reasoning abilities.
+The continual pretraining was performed using Axolotl on the Leonardo EuroHPC system.
+---
+## Training Setup
+**Objective:** Language modeling (unsupervised continual pretraining)
+**Adapter type:** LoRA
+**Precision:** bfloat16
+**Hardware:** 8 nodes × 2 × NVIDIA A100 64GB GPUs
+**Training duration:** 24 hours
+**Framework:** DeepSpeed ZeRO-1, Axolotl, PyTorch 2.5.1+cu121
+---
+## Dataset
+Public energy domain text sources:
+- `arxiv.jsonl` — scientific and technical papers
+- `gov.jsonl` — public governmental documents
+- `news.jsonl` — news articles
+- `wiki.jsonl` — Wikipedia text
+---
+## Hyperparameters
+| Parameter | Value |
+|------------|-------|
+| Sequence length | 2048 |
+| Micro batch size | 1 |
+| Gradient accumulation | 4 |
+| Learning rate | 2e-4 |
+| LR scheduler | cosine |
+| Optimizer | AdamW (8-bit) |
+| LoRA rank (r) | 16 |
+| LoRA alpha | 32 |
+| LoRA dropout | 0.05 |
+| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
+| Epochs | 1 |
+| Warmup steps | 10 |
+| Weight decay | 0.0 |