QLoRA-Finetuning / README.md
parthtamu's picture
Update README.md
04e3534 verified
---
language: en
license: mit
base_model: meta-llama/Llama-3.2-3B
datasets:
- sahil2801/CodeAlpaca-20k
tags:
- code-generation
- lora
- qlora
- peft
- fine-tuned
- llama
- instruction-tuning
library_name: peft
pipeline_tag: text-generation
---
# Llama-3.2-3B · CodeAlpaca LoRA Adapter
A LoRA adapter fine-tuned on [CodeAlpaca-20k](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k)
for instruction-following code generation tasks. Built on top of
[meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) with
4-bit NF4 quantization via `bitsandbytes`. Only **~1% of parameters** are
trainable — the rest of the base model is frozen.
---
## Model Details
| Field | Value |
|------------------|--------------------------------------------|
| **Base Model** | meta-llama/Llama-3.2-3B |
| **Adapter Type** | LoRA (via PEFT) |
| **Task** | Instruction-following code generation |
| **Language** | English |
| **License** | MIT |
| **Author** | Parth Deshmukh |
| **Date** | April 2026 |
---
## Training Configuration
| Config | Value |
|----------------------|-------------------------------------------------|
| **LoRA Rank (r)** | 8 |
| **LoRA Alpha** | 16 |
| **LoRA Dropout** | 0.05 |
| **Target Modules** | `q_proj`, `v_proj` |
| **Quantization** | 4-bit NF4 (`bitsandbytes` BitsAndBytesConfig) |
| **Compute dtype** | float16 |
| **Batch size** | 2 (+ gradient accumulation steps = 4) |
| **Mixed Precision** | fp16 |
| **Hardware** | Google Colab T4 GPU (16GB VRAM) |
| **Experiment Tracking** | MLflow + Weights & Biases |
---
## Dataset
- **Name:** [CodeAlpaca-20k](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k)
- **Size:** ~20,000 code instruction samples
- **Split:** 90/10 train/test (~18,000 train, ~2,000 test)
- **Columns:** `instruction`, `input`, `output`
- **Prompt format:**
Instruction:
{instruction}
Input:
{input}
Response:
{output}
text
---
## Evaluation Results
Evaluated on **200 held-out test samples** from CodeAlpaca-20k using 4-bit
quantized inference. Metrics computed with `evaluate` (ROUGE-L) and
`bert_score` (BERTScore-F1).
| Model | ROUGE-L | BERTScore-F1 |
|------------------------------------|---------|--------------|
| Base (Llama-3.2-3B, no adapter) | 0.3303 | 0.7835 |
| **Fine-tuned (this adapter)** | **0.5458** | **0.8856** |
| **Delta** | **+0.2155 (+65.2%)** | **+0.1021 (+13.0%)** |
> ROUGE-L of 0.5458 is at the top of the competitive range for fine-tuned
> code generation models (0.43–0.55), confirming that LoRA fine-tuning
> successfully taught the model consistent instruction-following and code
> formatting behavior.
---
## How to Use
Load the base model with 4-bit quantization, then apply this adapter using
PEFT's `PeftModel.from_pretrained()`.
**Prompt format:**
Instruction:
Write a Python function that reverses a string.
Input:
Response:
text
**Inference parameters used during evaluation:**
- `max_new_tokens`: 200
- `do_sample`: False
- `repetition_penalty`: 1.1
- `pad_token_id`: tokenizer.eos_token_id
---
## Limitations
- Trained for only **1–3 epochs** on 18k samples — may struggle with highly
complex or multi-file code tasks.
- Optimized for **single-instruction, single-response** code generation;
not designed for multi-turn conversation.
- Performance is measured on CodeAlpaca-style prompts; may degrade on very
different prompt formats.
- Base model is **3B parameters** — larger models (7B+) would likely achieve
higher absolute scores.
---
## Project
This adapter was built as part of a 7-day end-to-end LLM fine-tuning project
covering LoRA/QLoRA concepts, dataset preparation, training, evaluation,
deployment, and CI/CD. Full project repository:
[github.com/your-username/llm-lora-finetuning](https://github.com/your-username/llm-lora-finetuning)