Continual / root_gainlora

Commit History

fix FT score
b9eaa7b

natmin322 commited on

fix FT score
b58da25

natmin322 commited on

update OOM fix
6152366

natmin322 commited on

Fix OOM: prev LoRA on CPU + no_grad entire prev contribution (no saved tensors for backward)
3aef764

natmin322 commited on

Fix OOM: agg_lora_states accumulate weighted sum incrementally instead of O(N) cat
62e82d3

natmin322 commited on

Fix deprecation warnings: torch.load weights_only, cupy fromDlpack, as_target_tokenizer
3c3ef28

natmin322 commited on

fix: re-initialize trans_input/prompt_key after from_pretrained, add epsilon to norm, use_reentrant=False
d6a9f4f

natmin322 commited on

fix: pass attention_mask directly to model.generate(), not via GenerationConfig
915a112

natmin322 commited on

fix: override _save to disable safetensors for T5 shared embedding weights
bb4c9d9

natmin322 commited on

fix: denumpify_detensorize moved to trainer_utils in transformers 4.40+
a57a027

natmin322 commited on

fix: add explicit trainer_pt_utils imports (nested_truncate etc.) removed from trainer.* in 4.40+
e4e078c

natmin322 commited on

fix: comprehensive transformers 4.40 compat across all trainer files
2e720a9

natmin322 commited on

fix: replace removed _pad_across_processes with accelerator API
164f658

natmin322 commited on

fix: synced_gpus must be passed to generate(), not GenerationConfig
a92aa8a

natmin322 commited on

fix: LoRALayer.forward reshape incompatible with T5-small
b50bff4

natmin322 commited on

fix: preserve task_config_dir in T5_small scripts
8b246a8

natmin322 commited on

feat: add T5_small benchmark scripts for 4 comparison scenarios
e84f283

natmin322 commited on

fix: restore _set_gradient_checkpointing + enable_input_require_grads for gradient checkpointing
2b87f4b

natmin322 commited on

fix: use default use_reentrant=True for gradient checkpointing (model expects it)
008c76c

natmin322 commited on

fix: enable --gradient_checkpointing to fit T4 GPU memory
4791c61

natmin322 commited on

fix: revert eval_strategy back to evaluation_strategy (4.40.2 uses old name)
2f6ffef

natmin322 commited on

fix: rename --evaluation_strategy to --eval_strategy (transformers 4.40+)
fb265db

natmin322 commited on

fix: add typing imports to all trainer files
6331426

natmin322 commited on

fix: add Union to typing imports in cl_collator
acd898c

natmin322 commited on

fix: add Optional/Any import to cl_collator, ipdb try/except in score.py
467dac1

natmin322 commited on

fix: add FP16 safety check, dataset script check, LoRA sanity check
dfdd675

natmin322 commited on

fix: LoRA reinit, gradient checkpointing use_reentrant=False, checkpoint existence check, collator padding alignment
5299479

natmin322 commited on

fix: restore notebook cells
6a339c3

natmin322 commited on

add root_gainlora to repo for testing
e2bef95

natmin322 commited on

reduce rubish
5d89844

natmin322 commited on

fix bug root
816d960

natmin322 commited on

fix: _maybe_log_save_evaluate add grad_norm param for transformers 4.40.2
67281b3

natmin322 commited on

fix: do_grad_scaling compat for transformers 4.40.2
642dc3d

natmin322 commited on

fix: gradient checkpointing compat for transformers 4.40.2 (OOM fix)
7bd8512

natmin322 commited on

fix: sanitize NaN/Inf values before SVD computation
13c81ff

natmin322 commited on

fix: add compat shim for is_torch_tpu_available
74a4551

natmin322 commited on

fix: SVD CPU fallback for ill-conditioned matrices
3a19926

natmin322 commited on

fix: use gesvd driver for robust SVD + larger regularization
c2c5218

natmin322 commited on

fix: robust SVD fallback with direct regularization instead of pinv
f9548d4

natmin322 commited on

fix: replace self.use_apex and self.fsdp with getattr for compat
3f132e5

natmin322 commited on

fix: compat shim for ShardedDDPOption and sharded_ddp attribute
6f52dc6

natmin322 commited on

fix: replace deprecated torch.pinv with torch.linalg.pinv
d7fe0d8

natmin322 commited on

fix: add explicit imports for DistributedSampler, DataLoader, IterableDatasetShard, ipdb
fd8e73c

natmin322 commited on

fix: trust_remote_code and abs paths for load_dataset
5e730b3

natmin322 commited on

clean code
b8c0787

natmin322 commited on