UltraThinking-LLM-Training / docs /training_deepspeed.md

Upload folder using huggingface_hub

54c5666 verified 4 months ago

795 Bytes

	# DeepSpeed Training

	## Prereqs
	- Install: `pip install deepspeed`
	- GPU(s) with NVLink/PCIe and recent CUDA drivers

	## Quick start (ZeRO-2)
	```bash
	# Example config provided
	deepspeed --num_gpus 2 train_ultrathink.py \
	--deepspeed deepspeed_config_zero2.json \
	--dataset dummy --train_samples 4000 --val_samples 400 \
	--vocab_size 50257 --hidden_size 384 --num_layers 4 --num_heads 6 --num_kv_heads 6 \
	--intermediate_size 1536 --max_seq_length 512 \
	--batch_size 1 --gradient_accumulation_steps 64 \
	--learning_rate 5e-5 --use_amp --gradient_checkpointing
	```

	## Notes
	- Adjust `train_batch_size` and `gradient_accumulation_steps` in the JSON.
	- For ZeRO-3, set `"stage": 3` and ensure the system has fast storage for optimizer/param state offload if used.