commandr-35b-sft / README.md
kosmylo1992's picture
Update README.md
d881dad verified
---
{
"language": ["en"],
"license": "apache-2.0",
"tags": [
"text-generation",
"causal-lm",
"instruction-tuning",
"supervised-fine-tuning",
"synthetic-qa",
"lora",
"axolotl",
"deepspeed",
"transformers",
"commandr",
"cohere",
"eu-hpc"
],
"datasets": [
"axolotl_deduplicated_synthetic_qa"
],
"metrics": [
"loss"
],
"library_name": "transformers",
"framework": "pytorch",
"base_model": "CohereLabs/c4ai-command-r-v01",
"model_name": "commandr-35b-sft",
"pipeline_tag": "text-generation",
"task_categories": ["text-generation", "instruction-following"],
"model_type": "AutoModelForCausalLM",
"inference": {
"parameters": {
"max_new_tokens": 512,
"temperature": 0.7,
"top_p": 0.9
}
},
"trained_on": [
"Leonardo EuroHPC"
],
"description": "Supervised fine-tuning (SFT) of Cohere Command-R 35B on the synthetic QA dataset using LoRA and Axolotl. The model improves conversational reasoning and instruction-following capabilities."
}
---
# Command-R 35B — SFT (Supervised Fine-Tuning on Synthetic QA)
**Model type:** Causal Language Model
**Base model:** [CohereLabs/c4ai-command-r-v01](https://huggingface.co/CohereLabs/c4ai-command-r-v01)
**License:** Apache 2.0
**Framework:** [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
---
## Overview
`commandr-35b-sft` is a **supervised fine-tuned** variant of Cohere’s Command-R 35B model.
Fine-tuning was performed on a high-quality instruction-following dataset using LoRA adapters, enabling improved conversational reasoning and question answering.
Training was conducted on the **Leonardo EuroHPC** system.
---
## Training Setup
**Objective:** Supervised fine-tuning (instruction following)
**Adapter type:** LoRA
**Precision:** bfloat16
**Hardware:** 8 nodes × 2 × NVIDIA A100 64GB GPUs
**Framework:** DeepSpeed ZeRO-1, Axolotl, PyTorch 2.5.1+cu121
**Runtime:** ~6 hours
**Dataset split:** 70% train / 30% validation
---
## Dataset
**Name:** `axolotl_deduplicated_synthetic_qa.jsonl`
**Type:** Instruction-following synthetic QA dataset
Each sample follows a QA/chat format used in the `alpaca_chat.load_qa` schema.
---
## Hyperparameters
| Parameter | Value |
|------------|-------|
| Sequence length | 2048 |
| Micro batch size | 1 |
| Gradient accumulation | 2 |
| Epochs | 1 |
| Learning rate | 0.0001 |
| LR scheduler | cosine |
| Optimizer | AdamW (8-bit) |
| Warmup steps | 20 |
| Weight decay | 0.0 |
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| LoRA target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Gradient checkpointing | ✅ |
| Flash attention | ✅ |
| Auto resume | ✅ |
| Loss watchdog threshold | 8.0 |
| Loss watchdog patience | 20 |
---
## Tokenizer
**Tokenizer type:** `AutoTokenizer`
**Special token:** `<|end_of_text|>` as `pad_token`