DeepSpeed Training

Prereqs

Install: pip install deepspeed
GPU(s) with NVLink/PCIe and recent CUDA drivers

Quick start (ZeRO-2)

# Example config provided
deepspeed --num_gpus 2 train_ultrathink.py \
  --deepspeed deepspeed_config_zero2.json \
  --dataset dummy --train_samples 4000 --val_samples 400 \
  --vocab_size 50257 --hidden_size 384 --num_layers 4 --num_heads 6 --num_kv_heads 6 \
  --intermediate_size 1536 --max_seq_length 512 \
  --batch_size 1 --gradient_accumulation_steps 64 \
  --learning_rate 5e-5 --use_amp --gradient_checkpointing

Notes

Adjust train_batch_size and gradient_accumulation_steps in the JSON.
For ZeRO-3, set "stage": 3 and ensure the system has fast storage for optimizer/param state offload if used.