---
title: Gemma Code Generator
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: gemma
tags:
  - code-generation
  - gemma
  - fine-tuned
  - python
  - qlora
models:
  - nvhuynh16/gemma-2b-code-alpaca
---

# 🤖 Gemma Code Generator

Fine-tuned Gemma-2B model for Python code generation using QLoRA (Quantized Low-Rank Adaptation).

## 🎯 Project Overview

This demo showcases a fine-tuned Gemma-2B model trained on the CodeAlpaca dataset to generate Python code from natural language descriptions.

### Key Features

- ⚡ **Fast Training**: 4-6 hours on free Google Colab T4 GPU
- 💰 **Cost**: $0 (using free Colab tier)
- 📊 **Performance**: Expected 75-85% syntax correctness (vs 61% baseline)
- 🔧 **Method**: QLoRA (4-bit quantization + LoRA adapters)
- 📦 **Efficient**: Only 0.12% of parameters trained (3.2M / 2.6B)

## 📈 Model Performance

| Metric | Baseline (Pretrained) | Fine-Tuned (Expected) | Improvement |
|--------|----------------------|----------------------|-------------|
| **Syntax Correctness** | 61.0% | 75-85% | +14-24% |
| **BLEU Score** | 16.10 | 25-35 | +9-19 |
| **Trainable Parameters** | N/A | 0.12% | 100x fewer |

## 🛠️ Technical Details

- **Base Model**: `google/gemma-2-2b-it` (2.5B parameters)
- **Dataset**: CodeAlpaca-20k (3,600 training examples, 20% subset)
- **Fine-tuning Method**: QLoRA
  - LoRA rank (r): 16
  - LoRA alpha: 32
  - Quantization: 4-bit NF4
  - Target modules: q_proj, v_proj
- **Training**:
  - Epochs: 2
  - Batch size: 8 (2 per device × 4 accumulation)
  - Learning rate: 2e-4
  - Optimizer: paged_adamw_8bit
  - GPU: T4 (15GB VRAM, used ~4GB)
- **Framework**: PyTorch + HuggingFace Transformers + PEFT

## 💻 Usage

### Quick Demo

Try the live demo above! Just enter a code instruction like:
- "Write a function to check if a number is prime"
- "Create a function to reverse a string"
- "Implement binary search on a sorted list"

### Python Code

```python
from huggingface_hub import InferenceClient

client = InferenceClient()

prompt = """### Instruction:
Write a function to check if a number is prime

### Input:


### Response:
"""

response = client.text_generation(
    "nvhuynh16/gemma-2b-code-alpaca",
    prompt=prompt,
    max_new_tokens=256,
    temperature=0.7,
)

print(response)
```

### Load Model Directly (Requires GPU + bitsandbytes)

```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

# Load base model with 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-2b-it",
    quantization_config=bnb_config,
    device_map="auto",
)

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")

# Load fine-tuned adapters
model = PeftModel.from_pretrained(base_model, "nvhuynh16/gemma-2b-code-alpaca")

# Generate code
prompt = """### Instruction:
Write a function to check if a number is prime

### Input:


### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## 🎓 Use Cases

- **Learning Programming**: Get code examples for educational purposes
- **Prototyping**: Quickly generate boilerplate code
- **Interview Preparation**: Practice coding questions
- **Code Completion**: Assistance for simple functions
- **Algorithm Reference**: Implementation examples

## 🚀 Training Methodology

### Dataset Preparation
1. Loaded CodeAlpaca-20k dataset
2. Filtered invalid examples
3. Formatted in Alpaca instruction style
4. Split: 90% train, 5% validation, 5% test
5. Used 20% subset (3,600 examples) for memory efficiency

### Fine-Tuning Process
1. Loaded Gemma-2B with 4-bit quantization (reduced VRAM from 10GB → 4GB)
2. Applied LoRA adapters to attention layers only
3. Trained for 2 epochs (~900 steps)
4. Automatic checkpoint upload to HuggingFace Hub
5. Total training time: 4-6 hours on free Colab T4

### Memory Optimizations
- 4-bit quantization (BitsAndBytes NF4)
- LoRA adapters (0.12% trainable parameters)
- Gradient checkpointing
- 8-bit AdamW optimizer
- Reduced sequence length (256 tokens)
- Reduced batch size (2 per device)

## 📁 Repository Structure

```
├── notebooks/
│   ├── 02_fine_tuning_with_eval.ipynb  # Complete training + evaluation
│   └── 03_merge_adapters.ipynb         # Merge adapters (optional)
├── spaces/
│   ├── app.py                          # This Gradio demo
│   ├── requirements.txt                # Dependencies
│   └── README.md                       # This file
├── scripts/
│   ├── colab_quick_eval.py             # Evaluation script
│   └── train_local.py                  # Local training
└── results/
    └── baseline_100.json               # Baseline evaluation
```

## 🔗 Links

- **Model**: [nvhuynh16/gemma-2b-code-alpaca](https://huggingface.co/nvhuynh16/gemma-2b-code-alpaca)
- **Base Model**: [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it)
- **Dataset**: [CodeAlpaca-20k](https://github.com/sahil280114/codealpaca)
- **GitHub**: [Project Repository](#)
- **Portfolio**: [Nam Huynh](#)

## ⚠️ Limitations

- Primarily trained on Python code
- May generate verbose explanations alongside code
- Best for simple-to-moderate complexity functions
- Not suitable for production without human review
- Limited to patterns seen in training data

## 📄 License

This model is based on Gemma-2B-it and inherits its license. The fine-tuning adapters and this demo are provided for educational and demonstration purposes.

## 🙏 Acknowledgments

- **Google**: For the Gemma model family
- **Sahil Chaudhary**: For the CodeAlpaca dataset
- **HuggingFace**: For Transformers, PEFT, and inference infrastructure
- **Colab**: For free GPU access

---

**Built for portfolio demonstration** • Targeting AI/ML Applied Scientist roles • Relevant to SAP ABAP Foundation Model team

*This demo uses HuggingFace Inference API for serverless, cost-free inference*