Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.5.1
metadata
title: Gemma Code Generator
emoji: π€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: gemma
tags:
- code-generation
- gemma
- fine-tuned
- python
- qlora
models:
- nvhuynh16/gemma-2b-code-alpaca
π€ Gemma Code Generator
Fine-tuned Gemma-2B model for Python code generation using QLoRA (Quantized Low-Rank Adaptation).
π― Project Overview
This demo showcases a fine-tuned Gemma-2B model trained on the CodeAlpaca dataset to generate Python code from natural language descriptions.
Key Features
- β‘ Fast Training: 4-6 hours on free Google Colab T4 GPU
- π° Cost: $0 (using free Colab tier)
- π Performance: Expected 75-85% syntax correctness (vs 61% baseline)
- π§ Method: QLoRA (4-bit quantization + LoRA adapters)
- π¦ Efficient: Only 0.12% of parameters trained (3.2M / 2.6B)
π Model Performance
| Metric | Baseline (Pretrained) | Fine-Tuned (Expected) | Improvement |
|---|---|---|---|
| Syntax Correctness | 61.0% | 75-85% | +14-24% |
| BLEU Score | 16.10 | 25-35 | +9-19 |
| Trainable Parameters | N/A | 0.12% | 100x fewer |
π οΈ Technical Details
- Base Model:
google/gemma-2-2b-it(2.5B parameters) - Dataset: CodeAlpaca-20k (3,600 training examples, 20% subset)
- Fine-tuning Method: QLoRA
- LoRA rank (r): 16
- LoRA alpha: 32
- Quantization: 4-bit NF4
- Target modules: q_proj, v_proj
- Training:
- Epochs: 2
- Batch size: 8 (2 per device Γ 4 accumulation)
- Learning rate: 2e-4
- Optimizer: paged_adamw_8bit
- GPU: T4 (15GB VRAM, used ~4GB)
- Framework: PyTorch + HuggingFace Transformers + PEFT
π» Usage
Quick Demo
Try the live demo above! Just enter a code instruction like:
- "Write a function to check if a number is prime"
- "Create a function to reverse a string"
- "Implement binary search on a sorted list"
Python Code
from huggingface_hub import InferenceClient
client = InferenceClient()
prompt = """### Instruction:
Write a function to check if a number is prime
### Input:
### Response:
"""
response = client.text_generation(
"nvhuynh16/gemma-2b-code-alpaca",
prompt=prompt,
max_new_tokens=256,
temperature=0.7,
)
print(response)
Load Model Directly (Requires GPU + bitsandbytes)
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch
# Load base model with 4-bit quantization
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
base_model = AutoModelForCausalLM.from_pretrained(
"google/gemma-2-2b-it",
quantization_config=bnb_config,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")
# Load fine-tuned adapters
model = PeftModel.from_pretrained(base_model, "nvhuynh16/gemma-2b-code-alpaca")
# Generate code
prompt = """### Instruction:
Write a function to check if a number is prime
### Input:
### Response:
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
π Use Cases
- Learning Programming: Get code examples for educational purposes
- Prototyping: Quickly generate boilerplate code
- Interview Preparation: Practice coding questions
- Code Completion: Assistance for simple functions
- Algorithm Reference: Implementation examples
π Training Methodology
Dataset Preparation
- Loaded CodeAlpaca-20k dataset
- Filtered invalid examples
- Formatted in Alpaca instruction style
- Split: 90% train, 5% validation, 5% test
- Used 20% subset (3,600 examples) for memory efficiency
Fine-Tuning Process
- Loaded Gemma-2B with 4-bit quantization (reduced VRAM from 10GB β 4GB)
- Applied LoRA adapters to attention layers only
- Trained for 2 epochs (~900 steps)
- Automatic checkpoint upload to HuggingFace Hub
- Total training time: 4-6 hours on free Colab T4
Memory Optimizations
- 4-bit quantization (BitsAndBytes NF4)
- LoRA adapters (0.12% trainable parameters)
- Gradient checkpointing
- 8-bit AdamW optimizer
- Reduced sequence length (256 tokens)
- Reduced batch size (2 per device)
π Repository Structure
βββ notebooks/
β βββ 02_fine_tuning_with_eval.ipynb # Complete training + evaluation
β βββ 03_merge_adapters.ipynb # Merge adapters (optional)
βββ spaces/
β βββ app.py # This Gradio demo
β βββ requirements.txt # Dependencies
β βββ README.md # This file
βββ scripts/
β βββ colab_quick_eval.py # Evaluation script
β βββ train_local.py # Local training
βββ results/
βββ baseline_100.json # Baseline evaluation
π Links
- Model: nvhuynh16/gemma-2b-code-alpaca
- Base Model: google/gemma-2-2b-it
- Dataset: CodeAlpaca-20k
- GitHub: Project Repository
- Portfolio: Nam Huynh
β οΈ Limitations
- Primarily trained on Python code
- May generate verbose explanations alongside code
- Best for simple-to-moderate complexity functions
- Not suitable for production without human review
- Limited to patterns seen in training data
π License
This model is based on Gemma-2B-it and inherits its license. The fine-tuning adapters and this demo are provided for educational and demonstration purposes.
π Acknowledgments
- Google: For the Gemma model family
- Sahil Chaudhary: For the CodeAlpaca dataset
- HuggingFace: For Transformers, PEFT, and inference infrastructure
- Colab: For free GPU access
Built for portfolio demonstration β’ Targeting AI/ML Applied Scientist roles β’ Relevant to SAP ABAP Foundation Model team
This demo uses HuggingFace Inference API for serverless, cost-free inference