Text Generation
PEFT
Safetensors
English
code-generation
lora
qlora
fine-tuned
llama
instruction-tuning
Instructions to use parthtamu/QLoRA-Finetuning with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use parthtamu/QLoRA-Finetuning with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B") model = PeftModel.from_pretrained(base_model, "parthtamu/QLoRA-Finetuning") - Notebooks
- Google Colab
- Kaggle
| language: en | |
| license: mit | |
| base_model: meta-llama/Llama-3.2-3B | |
| datasets: | |
| - sahil2801/CodeAlpaca-20k | |
| tags: | |
| - code-generation | |
| - lora | |
| - qlora | |
| - peft | |
| - fine-tuned | |
| - llama | |
| - instruction-tuning | |
| library_name: peft | |
| pipeline_tag: text-generation | |
| # Llama-3.2-3B · CodeAlpaca LoRA Adapter | |
| A LoRA adapter fine-tuned on [CodeAlpaca-20k](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k) | |
| for instruction-following code generation tasks. Built on top of | |
| [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) with | |
| 4-bit NF4 quantization via `bitsandbytes`. Only **~1% of parameters** are | |
| trainable — the rest of the base model is frozen. | |
| --- | |
| ## Model Details | |
| | Field | Value | | |
| |------------------|--------------------------------------------| | |
| | **Base Model** | meta-llama/Llama-3.2-3B | | |
| | **Adapter Type** | LoRA (via PEFT) | | |
| | **Task** | Instruction-following code generation | | |
| | **Language** | English | | |
| | **License** | MIT | | |
| | **Author** | Parth Deshmukh | | |
| | **Date** | April 2026 | | |
| --- | |
| ## Training Configuration | |
| | Config | Value | | |
| |----------------------|-------------------------------------------------| | |
| | **LoRA Rank (r)** | 8 | | |
| | **LoRA Alpha** | 16 | | |
| | **LoRA Dropout** | 0.05 | | |
| | **Target Modules** | `q_proj`, `v_proj` | | |
| | **Quantization** | 4-bit NF4 (`bitsandbytes` BitsAndBytesConfig) | | |
| | **Compute dtype** | float16 | | |
| | **Batch size** | 2 (+ gradient accumulation steps = 4) | | |
| | **Mixed Precision** | fp16 | | |
| | **Hardware** | Google Colab T4 GPU (16GB VRAM) | | |
| | **Experiment Tracking** | MLflow + Weights & Biases | | |
| --- | |
| ## Dataset | |
| - **Name:** [CodeAlpaca-20k](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k) | |
| - **Size:** ~20,000 code instruction samples | |
| - **Split:** 90/10 train/test (~18,000 train, ~2,000 test) | |
| - **Columns:** `instruction`, `input`, `output` | |
| - **Prompt format:** | |
| Instruction: | |
| {instruction} | |
| Input: | |
| {input} | |
| Response: | |
| {output} | |
| text | |
| --- | |
| ## Evaluation Results | |
| Evaluated on **200 held-out test samples** from CodeAlpaca-20k using 4-bit | |
| quantized inference. Metrics computed with `evaluate` (ROUGE-L) and | |
| `bert_score` (BERTScore-F1). | |
| | Model | ROUGE-L | BERTScore-F1 | | |
| |------------------------------------|---------|--------------| | |
| | Base (Llama-3.2-3B, no adapter) | 0.3303 | 0.7835 | | |
| | **Fine-tuned (this adapter)** | **0.5458** | **0.8856** | | |
| | **Delta** | **+0.2155 (+65.2%)** | **+0.1021 (+13.0%)** | | |
| > ROUGE-L of 0.5458 is at the top of the competitive range for fine-tuned | |
| > code generation models (0.43–0.55), confirming that LoRA fine-tuning | |
| > successfully taught the model consistent instruction-following and code | |
| > formatting behavior. | |
| --- | |
| ## How to Use | |
| Load the base model with 4-bit quantization, then apply this adapter using | |
| PEFT's `PeftModel.from_pretrained()`. | |
| **Prompt format:** | |
| Instruction: | |
| Write a Python function that reverses a string. | |
| Input: | |
| Response: | |
| text | |
| **Inference parameters used during evaluation:** | |
| - `max_new_tokens`: 200 | |
| - `do_sample`: False | |
| - `repetition_penalty`: 1.1 | |
| - `pad_token_id`: tokenizer.eos_token_id | |
| --- | |
| ## Limitations | |
| - Trained for only **1–3 epochs** on 18k samples — may struggle with highly | |
| complex or multi-file code tasks. | |
| - Optimized for **single-instruction, single-response** code generation; | |
| not designed for multi-turn conversation. | |
| - Performance is measured on CodeAlpaca-style prompts; may degrade on very | |
| different prompt formats. | |
| - Base model is **3B parameters** — larger models (7B+) would likely achieve | |
| higher absolute scores. | |
| --- | |
| ## Project | |
| This adapter was built as part of a 7-day end-to-end LLM fine-tuning project | |
| covering LoRA/QLoRA concepts, dataset preparation, training, evaluation, | |
| deployment, and CI/CD. Full project repository: | |
| [github.com/your-username/llm-lora-finetuning](https://github.com/your-username/llm-lora-finetuning) |