# BioRLHF Training on Cayuga HPC **Cluster:** Cornell Cayuga HPC **Target:** GPU training with Mistral-7B + LoRA (SFT, DPO, GRPO) --- ## Quick Start ```bash # 1. SSH to Cayuga ssh jak4013@cayuga-login1 # 2. Submit a GRPO training job bash -l -c 'sbatch scripts/run_grpo_full.sh' # 3. Monitor squeue -u $USER tail -f logs/grpo_full_*.log ``` --- ## Step 1: Transfer Files to HPC From your local Mac: ```bash rsync -avz --progress \ /Users/jak4013/Dropbox/Bioinformatics/Claude/BioRLHF/biorlhf/ \ jak4013@cayuga-login1:/athena/cayuga_0003/scratch/users/jak4013/otsuka/training/BioRLHF/ ``` --- ## Step 2: Set Up Conda Environment (First Time Only) ```bash # SSH to Cayuga ssh jak4013@cayuga-login1 # Source conda (non-interactive shell requires explicit sourcing) . /home/fs01/jak4013/miniconda3/miniconda3/etc/profile.d/conda.sh # Create environment conda create -n biorlhf python=3.10 -y conda activate biorlhf # Install PyTorch with CUDA support conda install pytorch pytorch-cuda=12.1 -c pytorch -c nvidia -y # Install training dependencies pip install transformers>=4.36.0 peft>=0.6.0 trl>=0.14.0 pip install bitsandbytes>=0.41.0 accelerate>=0.24.0 datasets>=2.14.0 pip install wandb scipy scikit-learn sentencepiece jsonlines # Verify GPU access (on a GPU node) python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')" ``` --- ## Step 3: Training Options ### Option A: GRPO Training (Recommended) GRPO with verifier-based multi-reward training from an SFT checkpoint: ```bash # Submit via SLURM (use login shell for correct sbatch version) bash -l -c 'sbatch scripts/run_grpo_full.sh' ``` **Key config** (`configs/grpo_full_v2.json`): - G=16 generations per prompt - V1-V4 verifiers with weights [0.35, 0.30, 0.15, 0.20] - beta=0.02, 2 iterations per batch - ~48h on A40 ### Option B: SFT Training ```bash # Interactive session srun -p scu-gpu --gres=gpu:1 --mem=48G -c 8 --time=4:00:00 --account=cayuga_0003 --pty bash # Activate environment . /home/fs01/jak4013/miniconda3/miniconda3/etc/profile.d/conda.sh conda activate biorlhf # Run SFT cd /athena/cayuga_0003/scratch/users/jak4013/otsuka/training/BioRLHF biorlhf-train --model mistralai/Mistral-7B-v0.3 --dataset data/kmp_sft_final.json --output ./my_sft_model ``` ### Option C: Interactive GPU Session ```bash # Request GPU srun -p scu-gpu --gres=gpu:1 --mem=48G -c 8 --time=4:00:00 --account=cayuga_0003 --pty bash # Activate environment . /home/fs01/jak4013/miniconda3/miniconda3/etc/profile.d/conda.sh conda activate biorlhf # Navigate and run cd /athena/cayuga_0003/scratch/users/jak4013/otsuka/training/BioRLHF biorlhf-grpo --config configs/grpo_full_v2.json ``` --- ## Step 4: Monitor Training ```bash # Check job status squeue -u $USER # Tail logs tail -f logs/grpo_full_*.log # GPU usage (on compute node) nvidia-smi # WandB dashboard # https://wandb.ai/jangkeun-weill-cornell-medicine/biogrpo ``` --- ## Environment Details | Component | Version | |-----------|---------| | Python | 3.10 | | PyTorch | 2.5.1+cu121 | | Transformers | 4.57.3 | | TRL | 0.26.2 | | PEFT | 0.18.0 | --- ## GPU Options on Cayuga | GPU | VRAM | Best For | SLURM Flag | |-----|------|----------|------------| | A40 | 48GB | Standard GRPO/SFT with QLoRA | `--gres=gpu:1` | | A100 | 80GB | Larger batches, faster training | `--gres=gpu:a100:1` | --- ## Important Notes ### SLURM Version The default `sbatch` at `/usr/bin/sbatch` is outdated (v22.05.2). Use `bash -l -c 'sbatch ...'` to get the correct version (slurm/25.05.0) loaded via module. ### Conda in Non-Interactive Shells `source ~/.bashrc` does not work in non-interactive SSH. Always source conda directly: ```bash . /home/fs01/jak4013/miniconda3/miniconda3/etc/profile.d/conda.sh conda activate biorlhf ``` ### SFT Checkpoint Symlink The SFT model adapter is stored at: ``` /athena/cayuga_0003/scratch/users/jak4013/otsuka/training/biorlhf/kmp_sft_model_final ``` GRPO scripts auto-symlink this into the working directory. ### Batch Size with G=16 Both `per_device_eval_batch_size` and `generation_batch_size` must be divisible by `num_generations`. The TRL parameter is `generation_batch_size`, NOT `per_device_generation_batch_size`. ### Eval Performance GRPOTrainer's eval loop generates completions sequentially (~3 min/sample). With 107 eval samples, each eval pass takes ~5.3h. Set `eval_steps=9999` to skip in-training eval; run post-hoc evaluation instead. --- ## Troubleshooting ### "CUDA out of memory" Reduce batch size or gradient accumulation in the config JSON: ```json { "batch_size": 1, "gradient_accumulation_steps": 16 } ``` ### "No GPU available" ```bash nvidia-smi # Check GPU allocation squeue -u $USER # Verify you're on a GPU node ``` ### LoRA adapter loading fails The SFT checkpoint is a LoRA adapter, not a full model. Load base model first: ```python from peft import PeftModel from transformers import AutoModelForCausalLM base = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.3") model = PeftModel.from_pretrained(base, "path/to/kmp_sft_model_final") model = model.merge_and_unload() # Merge for GRPO training ``` --- ## Key Paths | Path | Description | |------|-------------| | `/athena/cayuga_0003/scratch/users/jak4013/otsuka/training/BioRLHF/` | Working directory | | `/athena/cayuga_0003/scratch/users/jak4013/otsuka/training/biorlhf/kmp_sft_model_final` | SFT checkpoint | | `/athena/cayuga_0003/scratch/users/jak4013/otsuka/data/` | Data directory | | `/home/fs01/jak4013/miniconda3/miniconda3/etc/profile.d/conda.sh` | Conda init script |