This is just a test based on the lora 65b model. Used for the MIT NLP class final project. Then there will be three steps:

  • Calculate and accumulate gradients
  • Determine the appropriate rank through gradient computation
  • Perform LORA fine-tuning.

LORA fine-tuning

For 24G VRAM on GPT2_SM model (Original version of Lora)

python main.py --train_batch_size 8 --valid_batch_size 8 --grad_acc 1 --model_card gpt2.SM --init_checkpoint pretrained_checkpoints/gpt2-pytorch_model.bin --work_dir alpha_sm --index 0

For 24G VRAM on GPT2_SM model (Our version of Lora) python main.py --train_batch_size 8 --valid_batch_size 8 --grad_acc 1 --model_card gpt2.SM --init_checkpoint pretrained_checkpoints/gpt2-pytorch_model.bin --work_dir alpha_sm --index 1

license: mit

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support