README.md · tinyllms/qwen2.5-7b-instruct-sft-loo-domain-knowledge at main

qwen2.5-7b-instruct-sft-loo-domain-knowledge / README.md

psr-ai

Create README.md

81a7c58 verified 3 months ago

preview code

raw

history blame contribute delete

2.52 kB

	---
	datasets:
	- tinyllms/game24-trajectories
	- tinyllms/aime-1983-2023-trajectories
	base_model:
	- Qwen/Qwen2.5-7B-Instruct
	tags:
	- leave-one-out
	- loo-domain-knowledge
	- max_seq_length=16384
	- lr=2e-5
	- batch_size=1
	- grad_accum=16
	- epochs=1
	- qlora
	- quantize=4bit_nf4
	- lora_rank=64
	- lora_alpha=128
	- lora_dropout=0.05
	- completion_only_loss
	- eval_size=0.1
	- cosine_schedule
	- warmup=0.05
	- bf16
	- ddp_workers=2
	- ray_job=raysubmit_A55M5NnZckrXmfWN
	---

	# Qwen2.5-7B-Instruct SFT — LOO Domain Knowledge

	Fine-tuned from Qwen/Qwen2.5-7B-Instruct using QLoRA (4-bit NF4 quantization + LoRA adapters, merged before upload).

	This is the SFT stage of a leave-one-out (LOO) experiment: the model is trained on Game24 and AIME trajectories, deliberately excluding domain knowledge (GPQA) data. The held-out domain is later used to measure cross-domain transfer.

	## Training Configuration

	- Learning rate: 2e-5 (cosine schedule, 5% warmup)
	- Batch size: 1 per device, gradient accumulation 16 (effective batch size 32 with 2 workers)
	- Epochs: 1
	- Max sequence length: 16384
	- Precision: bf16
	- Weight decay: 0.01

	## QLoRA

	- Quantization: 4-bit NF4 with double quantization
	- LoRA rank: 64
	- LoRA alpha: 128
	- LoRA dropout: 0.05
	- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

	## Loss

	- completion_only_loss: prompt tokens are masked; loss is computed only on assistant completion tokens
	- Dataset is converted from `messages` to `prompt`/`completion` format before training

	## Datasets

	Trained on two datasets (domain knowledge held out):

	\| Dataset \| Domain \|
	\|---------\|--------\|
	\| `tinyllms/game24-trajectories` \| Game of 24 — arithmetic reasoning \|
	\| `tinyllms/aime-1983-2023-trajectories` \| AIME — competition math \|

	Examples exceeding `max_seq_len` are filtered out. A 10% holdout is used for evaluation (eval runs every 10 steps).

	## Leave-One-Out Design

	\| Domain \| Role \|
	\|--------\|------\|
	\| Game24 \| Train \|
	\| AIME \| Train \|
	\| Domain Knowledge (GPQA) \| Held out \|

	The GRPO stage follows using `tinyllms/qwen2.5-7b-instruct-grpo-loo-domain-knowledge`, trained on the same two datasets. Transfer is measured by evaluating on GPQA Diamond.

	## Infrastructure

	- GPU: 2x NVIDIA H100 80GB (DDP)
	- Framework: TRL 0.29 + Ray Train
	- Tracking: [Weights & Biases](https://wandb.ai/psr-labs/pocket-sheet-sft/runs/pzs50igz) (project: `pocket-sheet-sft`)
	- Ray Job ID: raysubmit_A55M5NnZckrXmfWN