|
|
--- |
|
|
{ |
|
|
"language": ["en"], |
|
|
"license": "apache-2.0", |
|
|
"tags": [ |
|
|
"text-generation", |
|
|
"causal-lm", |
|
|
"instruction-tuning", |
|
|
"supervised-fine-tuning", |
|
|
"synthetic-qa", |
|
|
"lora", |
|
|
"axolotl", |
|
|
"deepspeed", |
|
|
"transformers", |
|
|
"commandr", |
|
|
"cohere", |
|
|
"eu-hpc" |
|
|
], |
|
|
"datasets": [ |
|
|
"axolotl_deduplicated_synthetic_qa" |
|
|
], |
|
|
"metrics": [ |
|
|
"loss" |
|
|
], |
|
|
"library_name": "transformers", |
|
|
"framework": "pytorch", |
|
|
"base_model": "CohereLabs/c4ai-command-r-v01", |
|
|
"model_name": "commandr-35b-sft", |
|
|
"pipeline_tag": "text-generation", |
|
|
"task_categories": ["text-generation", "instruction-following"], |
|
|
"model_type": "AutoModelForCausalLM", |
|
|
"inference": { |
|
|
"parameters": { |
|
|
"max_new_tokens": 512, |
|
|
"temperature": 0.7, |
|
|
"top_p": 0.9 |
|
|
} |
|
|
}, |
|
|
"trained_on": [ |
|
|
"Leonardo EuroHPC" |
|
|
], |
|
|
"description": "Supervised fine-tuning (SFT) of Cohere Command-R 35B on the synthetic QA dataset using LoRA and Axolotl. The model improves conversational reasoning and instruction-following capabilities." |
|
|
} |
|
|
--- |
|
|
|
|
|
# Command-R 35B — SFT (Supervised Fine-Tuning on Synthetic QA) |
|
|
|
|
|
**Model type:** Causal Language Model |
|
|
**Base model:** [CohereLabs/c4ai-command-r-v01](https://huggingface.co/CohereLabs/c4ai-command-r-v01) |
|
|
**License:** Apache 2.0 |
|
|
**Framework:** [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) |
|
|
|
|
|
--- |
|
|
|
|
|
## Overview |
|
|
|
|
|
`commandr-35b-sft` is a **supervised fine-tuned** variant of Cohere’s Command-R 35B model. |
|
|
Fine-tuning was performed on a high-quality instruction-following dataset using LoRA adapters, enabling improved conversational reasoning and question answering. |
|
|
|
|
|
Training was conducted on the **Leonardo EuroHPC** system. |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Setup |
|
|
|
|
|
**Objective:** Supervised fine-tuning (instruction following) |
|
|
**Adapter type:** LoRA |
|
|
**Precision:** bfloat16 |
|
|
**Hardware:** 8 nodes × 2 × NVIDIA A100 64GB GPUs |
|
|
**Framework:** DeepSpeed ZeRO-1, Axolotl, PyTorch 2.5.1+cu121 |
|
|
**Runtime:** ~6 hours |
|
|
**Dataset split:** 70% train / 30% validation |
|
|
|
|
|
--- |
|
|
|
|
|
## Dataset |
|
|
|
|
|
**Name:** `axolotl_deduplicated_synthetic_qa.jsonl` |
|
|
**Type:** Instruction-following synthetic QA dataset |
|
|
|
|
|
Each sample follows a QA/chat format used in the `alpaca_chat.load_qa` schema. |
|
|
|
|
|
--- |
|
|
|
|
|
## Hyperparameters |
|
|
|
|
|
| Parameter | Value | |
|
|
|------------|-------| |
|
|
| Sequence length | 2048 | |
|
|
| Micro batch size | 1 | |
|
|
| Gradient accumulation | 2 | |
|
|
| Epochs | 1 | |
|
|
| Learning rate | 0.0001 | |
|
|
| LR scheduler | cosine | |
|
|
| Optimizer | AdamW (8-bit) | |
|
|
| Warmup steps | 20 | |
|
|
| Weight decay | 0.0 | |
|
|
| LoRA rank (r) | 16 | |
|
|
| LoRA alpha | 32 | |
|
|
| LoRA dropout | 0.05 | |
|
|
| LoRA target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | |
|
|
| Gradient checkpointing | ✅ | |
|
|
| Flash attention | ✅ | |
|
|
| Auto resume | ✅ | |
|
|
| Loss watchdog threshold | 8.0 | |
|
|
| Loss watchdog patience | 20 | |
|
|
|
|
|
--- |
|
|
|
|
|
## Tokenizer |
|
|
|
|
|
**Tokenizer type:** `AutoTokenizer` |
|
|
**Special token:** `<|end_of_text|>` as `pad_token` |