LLaMA-3.1-8B-Instruct DoRA Fine-Tuned

Developed by: avinashhm
License: apache-2.0
Finetuned from model: devatar/quantized_Llama-3.1-8B-Instruct

This model is a fine-tuned version of devatar/quantized_Llama-3.1-8B-Instruct, adapted using DoRA (Weight-Decomposed Low-Rank Adaptation) on a subset of the mlabonne/FineTome-100k dataset. It is optimized for instruction-following tasks, such as answering questions and explaining concepts, and was fine-tuned on a 40GB GPU with memory-efficient techniques.

Training Details

Dataset: mlabonne/FineTome-100k (5,000 samples)
Fine-Tuning Method: DoRA (r=8, lora_alpha=16, target_modules=["q_proj", "k_proj", "v_proj", "o_proj"])
Training Steps: 500
Optimizer: Paged AdamW 8-bit
Learning Rate: 2e-5 (cosine scheduler)
Batch Size: Effective batch size of 8 (per_device_train_batch_size=1, gradient_accumulation_steps=8)
Precision: Mixed precision (FP16)
Training Loss: Decreased from 1.4494 to 0.8145 over 500 steps

Dataset

The model was trained on a 5,000-sample subset of mlabonne/FineTome-100k, which contains high-quality instruction-response pairs. Conversations were formatted as ### Human: ... ### Gpt: ... for training, covering tasks like explaining programming concepts and reasoning.

Usage

To use the model for inference:

from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("avinashhm/llama-3.1-8b-dora-finetuned", device_map="auto", torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("avinashhm/llama-3.1-8b-dora-finetuned")

inputs = tokenizer("Explain boolean operators in programming.", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Note: The base model is 4-bit quantized, and merging DoRA adapters may introduce minor rounding errors in generations.

Requirements

torch
transformers
peft
trl
datasets
bitsandbytes
GPU with at least 40GB VRAM for training (less for inference)

Install dependencies:

pip install torch transformers datasets peft trl bitsandbytes
pip install git+https://github.com/huggingface/peft.git

Limitations

Fine-tuned on a 5,000-sample subset, which may limit generalization.
4-bit quantization may introduce slight performance trade-offs.
Used an older trl version (pre-0.7.0), lacking features like max_seq_length.

Downloads last month: 7

Safetensors

Model size

8B params

Tensor type

BF16

F32

avinashhm
/

llama-3.1-8b-dora-finetuned