Text Generation
PEFT
Safetensors
English
code
code-generation
lora
qlora
llama
llama-3
conversational
Instructions to use shruthi-09/llama3-code-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use shruthi-09/llama3-code-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B-Instruct") model = PeftModel.from_pretrained(base_model, "shruthi-09/llama3-code-lora") - Notebooks
- Google Colab
- Kaggle
metadata
language:
- en
license: llama3.2
base_model: meta-llama/Llama-3.2-3B-Instruct
tags:
- code
- code-generation
- peft
- lora
- qlora
- llama
- llama-3
datasets:
- sahil2801/CodeAlpaca-20k
pipeline_tag: text-generation
library_name: peft
llama3-code-lora
QLoRA fine-tune of Llama-3.2-3B-Instruct specialized for Python code generation.
Model Details
| Property | Value |
|---|---|
| Base model | meta-llama/Llama-3.2-3B-Instruct |
| Fine-tuning method | QLoRA (4-bit NF4 + LoRA r=16) |
| Training dataset | CodeAlpaca-20k (5,000 examples) |
| Training hardware | Google Colab T4 (16GB VRAM) |
| Training duration | ~99 minutes |
| Final training loss | 0.54 |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| Trainable params | ~0.5% of total |
Training Results
| Epoch | Train Loss |
|---|---|
| 1 | ~1.1 |
| 2 | ~0.8 |
| 3 | 0.54 |
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch
base_model_id = "meta-llama/Llama-3.2-3B-Instruct"
adapter_id = "shruthi-09/llama3-code-lora"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
)
tokenizer = AutoTokenizer.from_pretrained(adapter_id)
base = AutoModelForCausalLM.from_pretrained(
base_model_id, quantization_config=bnb_config, device_map="auto"
)
model = PeftModel.from_pretrained(base, adapter_id)
messages = [
{"role": "system", "content": "You are an expert Python developer."},
{"role": "user", "content": "Write a binary search function."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=300, temperature=0.3, do_sample=True)
print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
Deployment
This model is served with Ollama + FastAPI in Docker. See the deployment repo for the full stack.
Limitations
- Optimized for Python only
- 5k training examples — may hallucinate on complex APIs
- Max reliable context: 2048 tokens