This is just a test based on the lora 65b model. Used for the MIT NLP class final project. Then there will be three steps:

Calculate and accumulate gradients
Determine the appropriate rank through gradient computation
Perform LORA fine-tuning.

LORA fine-tuning

For 24G VRAM on GPT2_SM model (Original version of Lora)

python main.py --train_batch_size 8 --valid_batch_size 8 --grad_acc 1 --model_card gpt2.SM --init_checkpoint pretrained_checkpoints/gpt2-pytorch_model.bin --work_dir alpha_sm --index 0

For 24G VRAM on GPT2_SM model (Our version of Lora) `python main.py --train_batch_size 8 --valid_batch_size 8 --grad_acc 1 --model_card gpt2.SM --init_checkpoint pretrained_checkpoints/gpt2-pytorch_model.bin --work_dir alpha_sm --index 1`

license: mit

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

LORA fine-tuning

For 24G VRAM on GPT2_SM model (Our version of Lora) python main.py --train_batch_size 8 --valid_batch_size 8 --grad_acc 1 --model_card gpt2.SM --init_checkpoint pretrained_checkpoints/gpt2-pytorch_model.bin --work_dir alpha_sm --index 1

license: mit

For 24G VRAM on GPT2_SM model (Our version of Lora) `python main.py --train_batch_size 8 --valid_batch_size 8 --grad_acc 1 --model_card gpt2.SM --init_checkpoint pretrained_checkpoints/gpt2-pytorch_model.bin --work_dir alpha_sm --index 1`