Fix OOM: prev LoRA on CPU + no_grad entire prev contribution (no saved tensors for backward) 3aef764 natmin322 commited on Mar 13
Fix OOM: agg_lora_states accumulate weighted sum incrementally instead of O(N) cat 62e82d3 natmin322 commited on Mar 13
Fix deprecation warnings: torch.load weights_only, cupy fromDlpack, as_target_tokenizer 3c3ef28 natmin322 commited on Mar 13
fix: re-initialize trans_input/prompt_key after from_pretrained, add epsilon to norm, use_reentrant=False d6a9f4f natmin322 commited on Mar 13
fix: pass attention_mask directly to model.generate(), not via GenerationConfig 915a112 natmin322 commited on Mar 12
fix: override _save to disable safetensors for T5 shared embedding weights bb4c9d9 natmin322 commited on Mar 12
fix: denumpify_detensorize moved to trainer_utils in transformers 4.40+ a57a027 natmin322 commited on Mar 12
fix: add explicit trainer_pt_utils imports (nested_truncate etc.) removed from trainer.* in 4.40+ e4e078c natmin322 commited on Mar 12
fix: comprehensive transformers 4.40 compat across all trainer files 2e720a9 natmin322 commited on Mar 12
fix: synced_gpus must be passed to generate(), not GenerationConfig a92aa8a natmin322 commited on Mar 12
fix: restore _set_gradient_checkpointing + enable_input_require_grads for gradient checkpointing 2b87f4b natmin322 commited on Mar 12
fix: use default use_reentrant=True for gradient checkpointing (model expects it) 008c76c natmin322 commited on Mar 12
fix: revert eval_strategy back to evaluation_strategy (4.40.2 uses old name) 2f6ffef natmin322 commited on Mar 12
fix: rename --evaluation_strategy to --eval_strategy (transformers 4.40+) fb265db natmin322 commited on Mar 12
fix: add Optional/Any import to cl_collator, ipdb try/except in score.py 467dac1 natmin322 commited on Mar 12
fix: add FP16 safety check, dataset script check, LoRA sanity check dfdd675 natmin322 commited on Mar 12
fix: LoRA reinit, gradient checkpointing use_reentrant=False, checkpoint existence check, collator padding alignment 5299479 natmin322 commited on Mar 12
fix: _maybe_log_save_evaluate add grad_norm param for transformers 4.40.2 67281b3 natmin322 commited on Mar 10
fix: gradient checkpointing compat for transformers 4.40.2 (OOM fix) 7bd8512 natmin322 commited on Mar 10
fix: robust SVD fallback with direct regularization instead of pinv f9548d4 natmin322 commited on Mar 10
fix: replace self.use_apex and self.fsdp with getattr for compat 3f132e5 natmin322 commited on Mar 10
fix: add explicit imports for DistributedSampler, DataLoader, IterableDatasetShard, ipdb fd8e73c natmin322 commited on Mar 10