Update README.md

Browse files

Files changed (1) hide show

README.md +57 -3

README.md CHANGED Viewed

@@ -1,3 +1,57 @@
----
-license: apache-2.0
----

+# Command-R 35B — CPT + SFT
+**Model type:** Causal Language Model
+**Base model:** [commandr-CPT](https://huggingface.co/kmylonas/commandr-CPT)
+**License:** Apache 2.0
+**Framework:** [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
+---
+## Overview
+`commandr-CPT-SFT` combines both **continual pretraining (CPT)** and **supervised fine-tuning (SFT)** phases on top of Cohere’s Command-R 35B.
+This model benefits from extended domain coverage (CPT) and enhanced instruction-following capabilities (SFT), resulting in a more stable and general-purpose conversational model.
+---
+## Training Setup
+**Stage 1 (CPT):** Domain-adaptive continual pretraining
+**Stage 2 (SFT):** Instruction fine-tuning
+**Adapter type:** LoRA
+**Precision:** bfloat16
+**Hardware:** 8 nodes × 2 × NVIDIA A100 64GB GPUs
+**Framework:** DeepSpeed ZeRO-1, Axolotl, PyTorch 2.5.1+cu121
+---
+## Datasets
+**CPT Stage:**
+- `arxiv.jsonl`
+- `gov.jsonl`
+- `news.jsonl`
+- `wiki.jsonl`
+**SFT Stage:**
+- `axolotl_deduplicated_synthetic_qa.jsonl`
+---
+## Hyperparameters
+| Parameter | Value |
+|------------|-------|
+| Sequence length | 2048 |
+| Micro batch size | 2 |
+| Gradient accumulation | 2 |
+| Learning rate | 2e-4 |
+| LR scheduler | cosine |
+| Optimizer | AdamW (8-bit) |
+| LoRA rank (r) | 16 |
+| LoRA alpha | 32 |
+| LoRA dropout | 0.05 |
+| Target modules | q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj |
+| Epochs | 1 |
+| Warmup steps | 10 |
+| Weight decay | 0.0 |