---
language:
- en
license: llama3.2
base_model: meta-llama/Llama-3.2-3B-Instruct
tags:
- code
- code-generation
- peft
- lora
- qlora
- llama
- llama-3
datasets:
- sahil2801/CodeAlpaca-20k
pipeline_tag: text-generation
library_name: peft
---

# llama3-code-lora

QLoRA fine-tune of [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) specialized for Python code generation.

## Model Details

| Property | Value |
|---|---|
| Base model | meta-llama/Llama-3.2-3B-Instruct |
| Fine-tuning method | QLoRA (4-bit NF4 + LoRA r=16) |
| Training dataset | CodeAlpaca-20k (5,000 examples) |
| Training hardware | Google Colab T4 (16GB VRAM) |
| Training duration | ~99 minutes |
| Final training loss | 0.54 |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| Trainable params | ~0.5% of total |

## Training Results

| Epoch | Train Loss |
|---|---|
| 1 | ~1.1 |
| 2 | ~0.8 |
| 3 | 0.54 |

## Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

base_model_id = "meta-llama/Llama-3.2-3B-Instruct"
adapter_id    = "shruthi-09/llama3-code-lora"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

tokenizer = AutoTokenizer.from_pretrained(adapter_id)
base = AutoModelForCausalLM.from_pretrained(
    base_model_id, quantization_config=bnb_config, device_map="auto"
)
model = PeftModel.from_pretrained(base, adapter_id)

messages = [
    {"role": "system", "content": "You are an expert Python developer."},
    {"role": "user", "content": "Write a binary search function."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=300, temperature=0.3, do_sample=True)

print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
```

## Deployment

This model is served with Ollama + FastAPI in Docker. See the [deployment repo](#) for the full stack.

## Limitations
- Optimized for Python only
- 5k training examples — may hallucinate on complex APIs
- Max reliable context: 2048 tokens