ubitech-edg
/

commandr-35b-sft

Text Generation

instruction-tuning

supervised-fine-tuning

text-generation-inference

Model card Files Files and versions

commandr-35b-sft / README.md

kosmylo1992's picture

Update README.md

d881dad verified 4 months ago

|

history blame contribute delete

2.96 kB

	---
	{
	"language": ["en"],
	"license": "apache-2.0",
	"tags": [
	"text-generation",
	"causal-lm",
	"instruction-tuning",
	"supervised-fine-tuning",
	"synthetic-qa",
	"lora",
	"axolotl",
	"deepspeed",
	"transformers",
	"commandr",
	"cohere",
	"eu-hpc"
	],
	"datasets": [
	"axolotl_deduplicated_synthetic_qa"
	],
	"metrics": [
	"loss"
	],
	"library_name": "transformers",
	"framework": "pytorch",
	"base_model": "CohereLabs/c4ai-command-r-v01",
	"model_name": "commandr-35b-sft",
	"pipeline_tag": "text-generation",
	"task_categories": ["text-generation", "instruction-following"],
	"model_type": "AutoModelForCausalLM",
	"inference": {
	"parameters": {
	"max_new_tokens": 512,
	"temperature": 0.7,
	"top_p": 0.9
	}
	},
	"trained_on": [
	"Leonardo EuroHPC"
	],
	"description": "Supervised fine-tuning (SFT) of Cohere Command-R 35B on the synthetic QA dataset using LoRA and Axolotl. The model improves conversational reasoning and instruction-following capabilities."
	}
	---

	# Command-R 35B — SFT (Supervised Fine-Tuning on Synthetic QA)

	Model type: Causal Language Model
	Base model: [CohereLabs/c4ai-command-r-v01](https://huggingface.co/CohereLabs/c4ai-command-r-v01)
	License: Apache 2.0
	Framework: [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)

	---

	## Overview

	`commandr-35b-sft` is a supervised fine-tuned variant of Cohere’s Command-R 35B model.
	Fine-tuning was performed on a high-quality instruction-following dataset using LoRA adapters, enabling improved conversational reasoning and question answering.

	Training was conducted on the Leonardo EuroHPC system.

	---

	## Training Setup

	Objective: Supervised fine-tuning (instruction following)
	Adapter type: LoRA
	Precision: bfloat16
	Hardware: 8 nodes × 2 × NVIDIA A100 64GB GPUs
	Framework: DeepSpeed ZeRO-1, Axolotl, PyTorch 2.5.1+cu121
	Runtime: ~6 hours
	Dataset split: 70% train / 30% validation

	---

	## Dataset

	Name: `axolotl_deduplicated_synthetic_qa.jsonl`
	Type: Instruction-following synthetic QA dataset

	Each sample follows a QA/chat format used in the `alpaca_chat.load_qa` schema.

	---

	## Hyperparameters

	\| Parameter \| Value \|
	\|------------\|-------\|
	\| Sequence length \| 2048 \|
	\| Micro batch size \| 1 \|
	\| Gradient accumulation \| 2 \|
	\| Epochs \| 1 \|
	\| Learning rate \| 0.0001 \|
	\| LR scheduler \| cosine \|
	\| Optimizer \| AdamW (8-bit) \|
	\| Warmup steps \| 20 \|
	\| Weight decay \| 0.0 \|
	\| LoRA rank (r) \| 16 \|
	\| LoRA alpha \| 32 \|
	\| LoRA dropout \| 0.05 \|
	\| LoRA target modules \| q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj \|
	\| Gradient checkpointing \| ✅ \|
	\| Flash attention \| ✅ \|
	\| Auto resume \| ✅ \|
	\| Loss watchdog threshold \| 8.0 \|
	\| Loss watchdog patience \| 20 \|

	---

	## Tokenizer

	Tokenizer type: `AutoTokenizer`
	Special token: `<\|end_of_text\|>` as `pad_token`