LLaMA-3.1-8B-Instruct DoRA Fine-Tuned

  • Developed by: avinashhm
  • License: apache-2.0
  • Finetuned from model: devatar/quantized_Llama-3.1-8B-Instruct

This model is a fine-tuned version of devatar/quantized_Llama-3.1-8B-Instruct, adapted using DoRA (Weight-Decomposed Low-Rank Adaptation) on a subset of the mlabonne/FineTome-100k dataset. It is optimized for instruction-following tasks, such as answering questions and explaining concepts, and was fine-tuned on a 40GB GPU with memory-efficient techniques.

Training Details

  • Dataset: mlabonne/FineTome-100k (5,000 samples)
  • Fine-Tuning Method: DoRA (r=8, lora_alpha=16, target_modules=["q_proj", "k_proj", "v_proj", "o_proj"])
  • Training Steps: 500
  • Optimizer: Paged AdamW 8-bit
  • Learning Rate: 2e-5 (cosine scheduler)
  • Batch Size: Effective batch size of 8 (per_device_train_batch_size=1, gradient_accumulation_steps=8)
  • Precision: Mixed precision (FP16)
  • Training Loss: Decreased from 1.4494 to 0.8145 over 500 steps

Dataset

The model was trained on a 5,000-sample subset of mlabonne/FineTome-100k, which contains high-quality instruction-response pairs. Conversations were formatted as ### Human: ... ### Gpt: ... for training, covering tasks like explaining programming concepts and reasoning.

Usage

To use the model for inference:

from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("avinashhm/llama-3.1-8b-dora-finetuned", device_map="auto", torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("avinashhm/llama-3.1-8b-dora-finetuned")

inputs = tokenizer("Explain boolean operators in programming.", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Note: The base model is 4-bit quantized, and merging DoRA adapters may introduce minor rounding errors in generations.

Requirements

  • torch
  • transformers
  • peft
  • trl
  • datasets
  • bitsandbytes
  • GPU with at least 40GB VRAM for training (less for inference)

Install dependencies:

pip install torch transformers datasets peft trl bitsandbytes
pip install git+https://github.com/huggingface/peft.git

Limitations

  • Fine-tuned on a 5,000-sample subset, which may limit generalization.
  • 4-bit quantization may introduce slight performance trade-offs.
  • Used an older trl version (pre-0.7.0), lacking features like max_seq_length.
Downloads last month
3
Safetensors
Model size
8B params
Tensor type
BF16
·
F32
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train avinashhm/llama-3.1-8b-dora-finetuned