shruthi-09
/

llama3-code-lora

Text Generation

code-generation

Model card Files Files and versions

llama3-code-lora / README.md

shruthi-09's picture

Add proper model card

5635069 verified 5 days ago

|

history blame contribute delete

2.27 kB

	---
	language:
	- en
	license: llama3.2
	base_model: meta-llama/Llama-3.2-3B-Instruct
	tags:
	- code
	- code-generation
	- peft
	- lora
	- qlora
	- llama
	- llama-3
	datasets:
	- sahil2801/CodeAlpaca-20k
	pipeline_tag: text-generation
	library_name: peft
	---

	# llama3-code-lora

	QLoRA fine-tune of [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) specialized for Python code generation.

	## Model Details

	\| Property \| Value \|
	\|---\|---\|
	\| Base model \| meta-llama/Llama-3.2-3B-Instruct \|
	\| Fine-tuning method \| QLoRA (4-bit NF4 + LoRA r=16) \|
	\| Training dataset \| CodeAlpaca-20k (5,000 examples) \|
	\| Training hardware \| Google Colab T4 (16GB VRAM) \|
	\| Training duration \| ~99 minutes \|
	\| Final training loss \| 0.54 \|
	\| LoRA rank \| 16 \|
	\| LoRA alpha \| 32 \|
	\| Trainable params \| ~0.5% of total \|

	## Training Results

	\| Epoch \| Train Loss \|
	\|---\|---\|
	\| 1 \| ~1.1 \|
	\| 2 \| ~0.8 \|
	\| 3 \| 0.54 \|

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
	from peft import PeftModel
	import torch

	base_model_id = "meta-llama/Llama-3.2-3B-Instruct"
	adapter_id = "shruthi-09/llama3-code-lora"

	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.float16,
	)

	tokenizer = AutoTokenizer.from_pretrained(adapter_id)
	base = AutoModelForCausalLM.from_pretrained(
	base_model_id, quantization_config=bnb_config, device_map="auto"
	)
	model = PeftModel.from_pretrained(base, adapter_id)

	messages = [
	{"role": "system", "content": "You are an expert Python developer."},
	{"role": "user", "content": "Write a binary search function."},
	]
	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(text, return_tensors="pt").to(model.device)

	with torch.no_grad():
	out = model.generate(**inputs, max_new_tokens=300, temperature=0.3, do_sample=True)

	print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
	```

	## Deployment

	This model is served with Ollama + FastAPI in Docker. See the [deployment repo](#) for the full stack.

	## Limitations
	- Optimized for Python only
	- 5k training examples — may hallucinate on complex APIs
	- Max reliable context: 2048 tokens