nvhuynh16's picture
Upload 3 files
8f637d6 verified
---
title: Gemma Code Generator
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: gemma
tags:
- code-generation
- gemma
- fine-tuned
- python
- qlora
models:
- nvhuynh16/gemma-2b-code-alpaca
---
# πŸ€– Gemma Code Generator
Fine-tuned Gemma-2B model for Python code generation using QLoRA (Quantized Low-Rank Adaptation).
## 🎯 Project Overview
This demo showcases a fine-tuned Gemma-2B model trained on the CodeAlpaca dataset to generate Python code from natural language descriptions.
### Key Features
- ⚑ **Fast Training**: 4-6 hours on free Google Colab T4 GPU
- πŸ’° **Cost**: $0 (using free Colab tier)
- πŸ“Š **Performance**: Expected 75-85% syntax correctness (vs 61% baseline)
- πŸ”§ **Method**: QLoRA (4-bit quantization + LoRA adapters)
- πŸ“¦ **Efficient**: Only 0.12% of parameters trained (3.2M / 2.6B)
## πŸ“ˆ Model Performance
| Metric | Baseline (Pretrained) | Fine-Tuned (Expected) | Improvement |
|--------|----------------------|----------------------|-------------|
| **Syntax Correctness** | 61.0% | 75-85% | +14-24% |
| **BLEU Score** | 16.10 | 25-35 | +9-19 |
| **Trainable Parameters** | N/A | 0.12% | 100x fewer |
## πŸ› οΈ Technical Details
- **Base Model**: `google/gemma-2-2b-it` (2.5B parameters)
- **Dataset**: CodeAlpaca-20k (3,600 training examples, 20% subset)
- **Fine-tuning Method**: QLoRA
- LoRA rank (r): 16
- LoRA alpha: 32
- Quantization: 4-bit NF4
- Target modules: q_proj, v_proj
- **Training**:
- Epochs: 2
- Batch size: 8 (2 per device Γ— 4 accumulation)
- Learning rate: 2e-4
- Optimizer: paged_adamw_8bit
- GPU: T4 (15GB VRAM, used ~4GB)
- **Framework**: PyTorch + HuggingFace Transformers + PEFT
## πŸ’» Usage
### Quick Demo
Try the live demo above! Just enter a code instruction like:
- "Write a function to check if a number is prime"
- "Create a function to reverse a string"
- "Implement binary search on a sorted list"
### Python Code
```python
from huggingface_hub import InferenceClient
client = InferenceClient()
prompt = """### Instruction:
Write a function to check if a number is prime
### Input:
### Response:
"""
response = client.text_generation(
"nvhuynh16/gemma-2b-code-alpaca",
prompt=prompt,
max_new_tokens=256,
temperature=0.7,
)
print(response)
```
### Load Model Directly (Requires GPU + bitsandbytes)
```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch
# Load base model with 4-bit quantization
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
base_model = AutoModelForCausalLM.from_pretrained(
"google/gemma-2-2b-it",
quantization_config=bnb_config,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")
# Load fine-tuned adapters
model = PeftModel.from_pretrained(base_model, "nvhuynh16/gemma-2b-code-alpaca")
# Generate code
prompt = """### Instruction:
Write a function to check if a number is prime
### Input:
### Response:
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## πŸŽ“ Use Cases
- **Learning Programming**: Get code examples for educational purposes
- **Prototyping**: Quickly generate boilerplate code
- **Interview Preparation**: Practice coding questions
- **Code Completion**: Assistance for simple functions
- **Algorithm Reference**: Implementation examples
## πŸš€ Training Methodology
### Dataset Preparation
1. Loaded CodeAlpaca-20k dataset
2. Filtered invalid examples
3. Formatted in Alpaca instruction style
4. Split: 90% train, 5% validation, 5% test
5. Used 20% subset (3,600 examples) for memory efficiency
### Fine-Tuning Process
1. Loaded Gemma-2B with 4-bit quantization (reduced VRAM from 10GB β†’ 4GB)
2. Applied LoRA adapters to attention layers only
3. Trained for 2 epochs (~900 steps)
4. Automatic checkpoint upload to HuggingFace Hub
5. Total training time: 4-6 hours on free Colab T4
### Memory Optimizations
- 4-bit quantization (BitsAndBytes NF4)
- LoRA adapters (0.12% trainable parameters)
- Gradient checkpointing
- 8-bit AdamW optimizer
- Reduced sequence length (256 tokens)
- Reduced batch size (2 per device)
## πŸ“ Repository Structure
```
β”œβ”€β”€ notebooks/
β”‚ β”œβ”€β”€ 02_fine_tuning_with_eval.ipynb # Complete training + evaluation
β”‚ └── 03_merge_adapters.ipynb # Merge adapters (optional)
β”œβ”€β”€ spaces/
β”‚ β”œβ”€β”€ app.py # This Gradio demo
β”‚ β”œβ”€β”€ requirements.txt # Dependencies
β”‚ └── README.md # This file
β”œβ”€β”€ scripts/
β”‚ β”œβ”€β”€ colab_quick_eval.py # Evaluation script
β”‚ └── train_local.py # Local training
└── results/
└── baseline_100.json # Baseline evaluation
```
## πŸ”— Links
- **Model**: [nvhuynh16/gemma-2b-code-alpaca](https://huggingface.co/nvhuynh16/gemma-2b-code-alpaca)
- **Base Model**: [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it)
- **Dataset**: [CodeAlpaca-20k](https://github.com/sahil280114/codealpaca)
- **GitHub**: [Project Repository](#)
- **Portfolio**: [Nam Huynh](#)
## ⚠️ Limitations
- Primarily trained on Python code
- May generate verbose explanations alongside code
- Best for simple-to-moderate complexity functions
- Not suitable for production without human review
- Limited to patterns seen in training data
## πŸ“„ License
This model is based on Gemma-2B-it and inherits its license. The fine-tuning adapters and this demo are provided for educational and demonstration purposes.
## πŸ™ Acknowledgments
- **Google**: For the Gemma model family
- **Sahil Chaudhary**: For the CodeAlpaca dataset
- **HuggingFace**: For Transformers, PEFT, and inference infrastructure
- **Colab**: For free GPU access
---
**Built for portfolio demonstration** β€’ Targeting AI/ML Applied Scientist roles β€’ Relevant to SAP ABAP Foundation Model team
*This demo uses HuggingFace Inference API for serverless, cost-free inference*