This is just a test based on the lora 65b model. Used for the MIT NLP class final project. Then there will be three steps:
- Calculate and accumulate gradients
- Determine the appropriate rank through gradient computation
- Perform LORA fine-tuning.
LORA fine-tuning
For 24G VRAM on GPT2_SM model (Original version of Lora)
python main.py --train_batch_size 8 --valid_batch_size 8 --grad_acc 1 --model_card gpt2.SM --init_checkpoint pretrained_checkpoints/gpt2-pytorch_model.bin --work_dir alpha_sm --index 0
For 24G VRAM on GPT2_SM model (Our version of Lora)
python main.py --train_batch_size 8 --valid_batch_size 8 --grad_acc 1 --model_card gpt2.SM --init_checkpoint pretrained_checkpoints/gpt2-pytorch_model.bin --work_dir alpha_sm --index 1
license: mit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support