Text Generation
PEFT
Safetensors
English
code
code-generation
lora
qlora
llama
llama-3
conversational
Instructions to use shruthi-09/llama3-code-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use shruthi-09/llama3-code-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B-Instruct") model = PeftModel.from_pretrained(base_model, "shruthi-09/llama3-code-lora") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - en | |
| license: llama3.2 | |
| base_model: meta-llama/Llama-3.2-3B-Instruct | |
| tags: | |
| - code | |
| - code-generation | |
| - peft | |
| - lora | |
| - qlora | |
| - llama | |
| - llama-3 | |
| datasets: | |
| - sahil2801/CodeAlpaca-20k | |
| pipeline_tag: text-generation | |
| library_name: peft | |
| # llama3-code-lora | |
| QLoRA fine-tune of [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) specialized for Python code generation. | |
| ## Model Details | |
| | Property | Value | | |
| |---|---| | |
| | Base model | meta-llama/Llama-3.2-3B-Instruct | | |
| | Fine-tuning method | QLoRA (4-bit NF4 + LoRA r=16) | | |
| | Training dataset | CodeAlpaca-20k (5,000 examples) | | |
| | Training hardware | Google Colab T4 (16GB VRAM) | | |
| | Training duration | ~99 minutes | | |
| | Final training loss | 0.54 | | |
| | LoRA rank | 16 | | |
| | LoRA alpha | 32 | | |
| | Trainable params | ~0.5% of total | | |
| ## Training Results | |
| | Epoch | Train Loss | | |
| |---|---| | |
| | 1 | ~1.1 | | |
| | 2 | ~0.8 | | |
| | 3 | 0.54 | | |
| ## Usage | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig | |
| from peft import PeftModel | |
| import torch | |
| base_model_id = "meta-llama/Llama-3.2-3B-Instruct" | |
| adapter_id = "shruthi-09/llama3-code-lora" | |
| bnb_config = BitsAndBytesConfig( | |
| load_in_4bit=True, | |
| bnb_4bit_quant_type="nf4", | |
| bnb_4bit_compute_dtype=torch.float16, | |
| ) | |
| tokenizer = AutoTokenizer.from_pretrained(adapter_id) | |
| base = AutoModelForCausalLM.from_pretrained( | |
| base_model_id, quantization_config=bnb_config, device_map="auto" | |
| ) | |
| model = PeftModel.from_pretrained(base, adapter_id) | |
| messages = [ | |
| {"role": "system", "content": "You are an expert Python developer."}, | |
| {"role": "user", "content": "Write a binary search function."}, | |
| ] | |
| text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| inputs = tokenizer(text, return_tensors="pt").to(model.device) | |
| with torch.no_grad(): | |
| out = model.generate(**inputs, max_new_tokens=300, temperature=0.3, do_sample=True) | |
| print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)) | |
| ``` | |
| ## Deployment | |
| This model is served with Ollama + FastAPI in Docker. See the [deployment repo](#) for the full stack. | |
| ## Limitations | |
| - Optimized for Python only | |
| - 5k training examples — may hallucinate on complex APIs | |
| - Max reliable context: 2048 tokens | |