metadata
license: mit
This is just a test based on the lora 65b model. Used for the MIT NLP class final project. Then there will be three steps:
- Calculate and accumulate gradients
- Determine the appropriate rank through gradient computation
- Perform LORA fine-tuning.
LORA fine-tuning
For 24G VRAM on GPT2_SM model (Original version of Lora)
python main.py --train_batch_size 8 --valid_batch_size 8 --grad_acc 1 --model_card gpt2.SM --init_checkpoint pretrained_checkpoints/gpt2-pytorch_model.bin --work_dir alpha_sm --index 0