Fix OOM: prev LoRA on CPU + no_grad entire prev contribution (no saved tensors for backward) 3aef764 natmin322 commited on Mar 13
Fix deprecation warnings: torch.load weights_only, cupy fromDlpack, as_target_tokenizer 3c3ef28 natmin322 commited on Mar 13
fix: re-initialize trans_input/prompt_key after from_pretrained, add epsilon to norm, use_reentrant=False d6a9f4f natmin322 commited on Mar 13
fix: restore _set_gradient_checkpointing + enable_input_require_grads for gradient checkpointing 2b87f4b natmin322 commited on Mar 12
fix: use default use_reentrant=True for gradient checkpointing (model expects it) 008c76c natmin322 commited on Mar 12
fix: add FP16 safety check, dataset script check, LoRA sanity check dfdd675 natmin322 commited on Mar 12
fix: LoRA reinit, gradient checkpointing use_reentrant=False, checkpoint existence check, collator padding alignment 5299479 natmin322 commited on Mar 12
revert: discard all 2-GPU DataParallel/DDP changes, back to f51f791 8644d30 natmin322 commited on Mar 10
fix: SVD jitter + trans_input.pt exists guard in both root and improve c1277de natmin322 commited on Mar 9