--- language: - en license: llama3.2 base_model: meta-llama/Llama-3.2-3B-Instruct tags: - code - code-generation - peft - lora - qlora - llama - llama-3 datasets: - sahil2801/CodeAlpaca-20k pipeline_tag: text-generation library_name: peft --- # llama3-code-lora QLoRA fine-tune of [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) specialized for Python code generation. ## Model Details | Property | Value | |---|---| | Base model | meta-llama/Llama-3.2-3B-Instruct | | Fine-tuning method | QLoRA (4-bit NF4 + LoRA r=16) | | Training dataset | CodeAlpaca-20k (5,000 examples) | | Training hardware | Google Colab T4 (16GB VRAM) | | Training duration | ~99 minutes | | Final training loss | 0.54 | | LoRA rank | 16 | | LoRA alpha | 32 | | Trainable params | ~0.5% of total | ## Training Results | Epoch | Train Loss | |---|---| | 1 | ~1.1 | | 2 | ~0.8 | | 3 | 0.54 | ## Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig from peft import PeftModel import torch base_model_id = "meta-llama/Llama-3.2-3B-Instruct" adapter_id = "shruthi-09/llama3-code-lora" bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16, ) tokenizer = AutoTokenizer.from_pretrained(adapter_id) base = AutoModelForCausalLM.from_pretrained( base_model_id, quantization_config=bnb_config, device_map="auto" ) model = PeftModel.from_pretrained(base, adapter_id) messages = [ {"role": "system", "content": "You are an expert Python developer."}, {"role": "user", "content": "Write a binary search function."}, ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) with torch.no_grad(): out = model.generate(**inputs, max_new_tokens=300, temperature=0.3, do_sample=True) print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)) ``` ## Deployment This model is served with Ollama + FastAPI in Docker. See the [deployment repo](#) for the full stack. ## Limitations - Optimized for Python only - 5k training examples — may hallucinate on complex APIs - Max reliable context: 2048 tokens