Commit History

fix FT score
b9eaa7b

natmin322 commited on

update OOM fix
6152366

natmin322 commited on

Fix OOM: prev LoRA on CPU + no_grad entire prev contribution (no saved tensors for backward)
3aef764

natmin322 commited on

Fix OOM: agg_lora_states accumulate weighted sum incrementally instead of O(N) cat
62e82d3

natmin322 commited on

Fix deprecation warnings: torch.load weights_only, cupy fromDlpack, as_target_tokenizer
3c3ef28

natmin322 commited on

fix: re-initialize trans_input/prompt_key after from_pretrained, add epsilon to norm, use_reentrant=False
d6a9f4f

natmin322 commited on

fix: pass attention_mask directly to model.generate(), not via GenerationConfig
915a112

natmin322 commited on

fix: override _save to disable safetensors for T5 shared embedding weights
bb4c9d9

natmin322 commited on

fix: denumpify_detensorize moved to trainer_utils in transformers 4.40+
a57a027

natmin322 commited on

fix: add explicit trainer_pt_utils imports (nested_truncate etc.) removed from trainer.* in 4.40+
e4e078c

natmin322 commited on

fix: comprehensive transformers 4.40 compat across all trainer files
2e720a9

natmin322 commited on

fix: replace removed _pad_across_processes with accelerator API
164f658

natmin322 commited on

fix: synced_gpus must be passed to generate(), not GenerationConfig
a92aa8a

natmin322 commited on

fix: LoRALayer.forward reshape incompatible with T5-small
b50bff4

natmin322 commited on

fix: restore _set_gradient_checkpointing + enable_input_require_grads for gradient checkpointing
2b87f4b

natmin322 commited on

fix: use default use_reentrant=True for gradient checkpointing (model expects it)
008c76c

natmin322 commited on

fix: add typing imports to all trainer files
6331426

natmin322 commited on

fix: add Union to typing imports in cl_collator
acd898c

natmin322 commited on

fix: add Optional/Any import to cl_collator, ipdb try/except in score.py
467dac1

natmin322 commited on

fix: add FP16 safety check, dataset script check, LoRA sanity check
dfdd675

natmin322 commited on

fix: LoRA reinit, gradient checkpointing use_reentrant=False, checkpoint existence check, collator padding alignment
5299479

natmin322 commited on

add root_gainlora to repo for testing
e2bef95

natmin322 commited on

reduce rubish
5d89844

natmin322 commited on

fix: _maybe_log_save_evaluate add grad_norm param for transformers 4.40.2
67281b3

natmin322 commited on

fix: do_grad_scaling compat for transformers 4.40.2
642dc3d

natmin322 commited on

fix: gradient checkpointing compat for transformers 4.40.2 (OOM fix)
7bd8512

natmin322 commited on

fix: sanitize NaN/Inf values before SVD computation
13c81ff

natmin322 commited on

fix: add compat shim for is_torch_tpu_available
74a4551

natmin322 commited on

fix: SVD CPU fallback for ill-conditioned matrices
3a19926

natmin322 commited on

fix: use gesvd driver for robust SVD + larger regularization
c2c5218

natmin322 commited on

fix: robust SVD fallback with direct regularization instead of pinv
f9548d4

natmin322 commited on

fix: replace self.use_apex and self.fsdp with getattr for compat
3f132e5

natmin322 commited on

fix: compat shim for ShardedDDPOption and sharded_ddp attribute
6f52dc6

natmin322 commited on

fix: replace deprecated torch.pinv with torch.linalg.pinv
d7fe0d8

natmin322 commited on

fix: add explicit imports for DistributedSampler, DataLoader, IterableDatasetShard, ipdb
fd8e73c

natmin322 commited on

fix: trust_remote_code and abs paths for load_dataset
5e730b3

natmin322 commited on

clean code
b8c0787

natmin322 commited on

revert: discard all 2-GPU DataParallel/DDP changes, back to f51f791
8644d30

natmin322 commited on

DDP 2 GPU
aa4ad72

natmin322 commited on

fix: robust SVD with try-except+pinv fallback + trans_input.pt exists guard
ff3da70

natmin322 commited on

fix: SVD jitter + trans_input.pt exists guard in both root and improve
c1277de

natmin322 commited on

setup root
afd6439

natmin322 commited on

fix: port all compat fixes from improve to root trainer (ShardedDDPOption, ipdb, nested_truncate, gradient_checkpointing use_reentrant)
327a1fc

natmin322 commited on

test root
276234e

natmin322 commited on

new change
92ad19e

natmin322 commited on