Fix: add assistant_only_loss=False to prevent all labels being masked to -100 76bf694 verified ssdataanalysis commited on 14 days ago
Add adapter resume support: loads previous adapter from hub if available, 48h timeout ready 12dd270 verified ssdataanalysis commited on 14 days ago
Optimal config: high-quality Hebrew data, constant LR, packing disabled, lora_dropout=0.1 b88a92b verified ssdataanalysis commited on 14 days ago
Switch to packing=True, batch=4, grad_acc=4, step-based checkpoints for speed 38eb889 verified ssdataanalysis commited on 14 days ago