llama3-code-lora / README.md
shruthi-09's picture
Add proper model card
5635069 verified
metadata
language:
  - en
license: llama3.2
base_model: meta-llama/Llama-3.2-3B-Instruct
tags:
  - code
  - code-generation
  - peft
  - lora
  - qlora
  - llama
  - llama-3
datasets:
  - sahil2801/CodeAlpaca-20k
pipeline_tag: text-generation
library_name: peft

llama3-code-lora

QLoRA fine-tune of Llama-3.2-3B-Instruct specialized for Python code generation.

Model Details

Property Value
Base model meta-llama/Llama-3.2-3B-Instruct
Fine-tuning method QLoRA (4-bit NF4 + LoRA r=16)
Training dataset CodeAlpaca-20k (5,000 examples)
Training hardware Google Colab T4 (16GB VRAM)
Training duration ~99 minutes
Final training loss 0.54
LoRA rank 16
LoRA alpha 32
Trainable params ~0.5% of total

Training Results

Epoch Train Loss
1 ~1.1
2 ~0.8
3 0.54

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

base_model_id = "meta-llama/Llama-3.2-3B-Instruct"
adapter_id    = "shruthi-09/llama3-code-lora"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

tokenizer = AutoTokenizer.from_pretrained(adapter_id)
base = AutoModelForCausalLM.from_pretrained(
    base_model_id, quantization_config=bnb_config, device_map="auto"
)
model = PeftModel.from_pretrained(base, adapter_id)

messages = [
    {"role": "system", "content": "You are an expert Python developer."},
    {"role": "user", "content": "Write a binary search function."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=300, temperature=0.3, do_sample=True)

print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Deployment

This model is served with Ollama + FastAPI in Docker. See the deployment repo for the full stack.

Limitations

  • Optimized for Python only
  • 5k training examples — may hallucinate on complex APIs
  • Max reliable context: 2048 tokens