nvhuynh16's picture
Upload 3 files
8f637d6 verified

A newer version of the Gradio SDK is available: 6.5.1

Upgrade
metadata
title: Gemma Code Generator
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: gemma
tags:
  - code-generation
  - gemma
  - fine-tuned
  - python
  - qlora
models:
  - nvhuynh16/gemma-2b-code-alpaca

πŸ€– Gemma Code Generator

Fine-tuned Gemma-2B model for Python code generation using QLoRA (Quantized Low-Rank Adaptation).

🎯 Project Overview

This demo showcases a fine-tuned Gemma-2B model trained on the CodeAlpaca dataset to generate Python code from natural language descriptions.

Key Features

  • ⚑ Fast Training: 4-6 hours on free Google Colab T4 GPU
  • πŸ’° Cost: $0 (using free Colab tier)
  • πŸ“Š Performance: Expected 75-85% syntax correctness (vs 61% baseline)
  • πŸ”§ Method: QLoRA (4-bit quantization + LoRA adapters)
  • πŸ“¦ Efficient: Only 0.12% of parameters trained (3.2M / 2.6B)

πŸ“ˆ Model Performance

Metric Baseline (Pretrained) Fine-Tuned (Expected) Improvement
Syntax Correctness 61.0% 75-85% +14-24%
BLEU Score 16.10 25-35 +9-19
Trainable Parameters N/A 0.12% 100x fewer

πŸ› οΈ Technical Details

  • Base Model: google/gemma-2-2b-it (2.5B parameters)
  • Dataset: CodeAlpaca-20k (3,600 training examples, 20% subset)
  • Fine-tuning Method: QLoRA
    • LoRA rank (r): 16
    • LoRA alpha: 32
    • Quantization: 4-bit NF4
    • Target modules: q_proj, v_proj
  • Training:
    • Epochs: 2
    • Batch size: 8 (2 per device Γ— 4 accumulation)
    • Learning rate: 2e-4
    • Optimizer: paged_adamw_8bit
    • GPU: T4 (15GB VRAM, used ~4GB)
  • Framework: PyTorch + HuggingFace Transformers + PEFT

πŸ’» Usage

Quick Demo

Try the live demo above! Just enter a code instruction like:

  • "Write a function to check if a number is prime"
  • "Create a function to reverse a string"
  • "Implement binary search on a sorted list"

Python Code

from huggingface_hub import InferenceClient

client = InferenceClient()

prompt = """### Instruction:
Write a function to check if a number is prime

### Input:


### Response:
"""

response = client.text_generation(
    "nvhuynh16/gemma-2b-code-alpaca",
    prompt=prompt,
    max_new_tokens=256,
    temperature=0.7,
)

print(response)

Load Model Directly (Requires GPU + bitsandbytes)

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

# Load base model with 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-2b-it",
    quantization_config=bnb_config,
    device_map="auto",
)

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")

# Load fine-tuned adapters
model = PeftModel.from_pretrained(base_model, "nvhuynh16/gemma-2b-code-alpaca")

# Generate code
prompt = """### Instruction:
Write a function to check if a number is prime

### Input:


### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

πŸŽ“ Use Cases

  • Learning Programming: Get code examples for educational purposes
  • Prototyping: Quickly generate boilerplate code
  • Interview Preparation: Practice coding questions
  • Code Completion: Assistance for simple functions
  • Algorithm Reference: Implementation examples

πŸš€ Training Methodology

Dataset Preparation

  1. Loaded CodeAlpaca-20k dataset
  2. Filtered invalid examples
  3. Formatted in Alpaca instruction style
  4. Split: 90% train, 5% validation, 5% test
  5. Used 20% subset (3,600 examples) for memory efficiency

Fine-Tuning Process

  1. Loaded Gemma-2B with 4-bit quantization (reduced VRAM from 10GB β†’ 4GB)
  2. Applied LoRA adapters to attention layers only
  3. Trained for 2 epochs (~900 steps)
  4. Automatic checkpoint upload to HuggingFace Hub
  5. Total training time: 4-6 hours on free Colab T4

Memory Optimizations

  • 4-bit quantization (BitsAndBytes NF4)
  • LoRA adapters (0.12% trainable parameters)
  • Gradient checkpointing
  • 8-bit AdamW optimizer
  • Reduced sequence length (256 tokens)
  • Reduced batch size (2 per device)

πŸ“ Repository Structure

β”œβ”€β”€ notebooks/
β”‚   β”œβ”€β”€ 02_fine_tuning_with_eval.ipynb  # Complete training + evaluation
β”‚   └── 03_merge_adapters.ipynb         # Merge adapters (optional)
β”œβ”€β”€ spaces/
β”‚   β”œβ”€β”€ app.py                          # This Gradio demo
β”‚   β”œβ”€β”€ requirements.txt                # Dependencies
β”‚   └── README.md                       # This file
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ colab_quick_eval.py             # Evaluation script
β”‚   └── train_local.py                  # Local training
└── results/
    └── baseline_100.json               # Baseline evaluation

πŸ”— Links

⚠️ Limitations

  • Primarily trained on Python code
  • May generate verbose explanations alongside code
  • Best for simple-to-moderate complexity functions
  • Not suitable for production without human review
  • Limited to patterns seen in training data

πŸ“„ License

This model is based on Gemma-2B-it and inherits its license. The fine-tuning adapters and this demo are provided for educational and demonstration purposes.

πŸ™ Acknowledgments

  • Google: For the Gemma model family
  • Sahil Chaudhary: For the CodeAlpaca dataset
  • HuggingFace: For Transformers, PEFT, and inference infrastructure
  • Colab: For free GPU access

Built for portfolio demonstration β€’ Targeting AI/ML Applied Scientist roles β€’ Relevant to SAP ABAP Foundation Model team

This demo uses HuggingFace Inference API for serverless, cost-free inference