Spaces:

nvhuynh16
/

gemma-code-generator

Sleeping

App Files Files Community

gemma-code-generator / README.md

nvhuynh16

Upload 3 files

8f637d6 verified 3 months ago

preview code

raw

history blame contribute delete

6.31 kB

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

metadata

title: Gemma Code Generator
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: gemma
tags:
  - code-generation
  - gemma
  - fine-tuned
  - python
  - qlora
models:
  - nvhuynh16/gemma-2b-code-alpaca

🤖 Gemma Code Generator

Fine-tuned Gemma-2B model for Python code generation using QLoRA (Quantized Low-Rank Adaptation).

🎯 Project Overview

This demo showcases a fine-tuned Gemma-2B model trained on the CodeAlpaca dataset to generate Python code from natural language descriptions.

Key Features

⚡ Fast Training: 4-6 hours on free Google Colab T4 GPU
💰 Cost: $0 (using free Colab tier)
📊 Performance: Expected 75-85% syntax correctness (vs 61% baseline)
🔧 Method: QLoRA (4-bit quantization + LoRA adapters)
📦 Efficient: Only 0.12% of parameters trained (3.2M / 2.6B)

📈 Model Performance

Metric	Baseline (Pretrained)	Fine-Tuned (Expected)	Improvement
Syntax Correctness	61.0%	75-85%	+14-24%
BLEU Score	16.10	25-35	+9-19
Trainable Parameters	N/A	0.12%	100x fewer

🛠️ Technical Details

Base Model: google/gemma-2-2b-it (2.5B parameters)
Dataset: CodeAlpaca-20k (3,600 training examples, 20% subset)
Fine-tuning Method: QLoRA
- LoRA rank (r): 16
- LoRA alpha: 32
- Quantization: 4-bit NF4
- Target modules: q_proj, v_proj
Training:
- Epochs: 2
- Batch size: 8 (2 per device × 4 accumulation)
- Learning rate: 2e-4
- Optimizer: paged_adamw_8bit
- GPU: T4 (15GB VRAM, used ~4GB)
Framework: PyTorch + HuggingFace Transformers + PEFT

💻 Usage

Quick Demo

Try the live demo above! Just enter a code instruction like:

"Write a function to check if a number is prime"
"Create a function to reverse a string"
"Implement binary search on a sorted list"

Python Code

from huggingface_hub import InferenceClient

client = InferenceClient()

prompt = """### Instruction:
Write a function to check if a number is prime

### Input:


### Response:
"""

response = client.text_generation(
    "nvhuynh16/gemma-2b-code-alpaca",
    prompt=prompt,
    max_new_tokens=256,
    temperature=0.7,
)

print(response)

Load Model Directly (Requires GPU + bitsandbytes)

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

# Load base model with 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-2b-it",
    quantization_config=bnb_config,
    device_map="auto",
)

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")

# Load fine-tuned adapters
model = PeftModel.from_pretrained(base_model, "nvhuynh16/gemma-2b-code-alpaca")

# Generate code
prompt = """### Instruction:
Write a function to check if a number is prime

### Input:


### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🎓 Use Cases

Learning Programming: Get code examples for educational purposes
Prototyping: Quickly generate boilerplate code
Interview Preparation: Practice coding questions
Code Completion: Assistance for simple functions
Algorithm Reference: Implementation examples

🚀 Training Methodology

Dataset Preparation

Loaded CodeAlpaca-20k dataset
Filtered invalid examples
Formatted in Alpaca instruction style
Split: 90% train, 5% validation, 5% test
Used 20% subset (3,600 examples) for memory efficiency

Fine-Tuning Process

Loaded Gemma-2B with 4-bit quantization (reduced VRAM from 10GB → 4GB)
Applied LoRA adapters to attention layers only
Trained for 2 epochs (~900 steps)
Automatic checkpoint upload to HuggingFace Hub
Total training time: 4-6 hours on free Colab T4

Memory Optimizations

4-bit quantization (BitsAndBytes NF4)
LoRA adapters (0.12% trainable parameters)
Gradient checkpointing
8-bit AdamW optimizer
Reduced sequence length (256 tokens)
Reduced batch size (2 per device)

📁 Repository Structure

├── notebooks/
│   ├── 02_fine_tuning_with_eval.ipynb  # Complete training + evaluation
│   └── 03_merge_adapters.ipynb         # Merge adapters (optional)
├── spaces/
│   ├── app.py                          # This Gradio demo
│   ├── requirements.txt                # Dependencies
│   └── README.md                       # This file
├── scripts/
│   ├── colab_quick_eval.py             # Evaluation script
│   └── train_local.py                  # Local training
└── results/
    └── baseline_100.json               # Baseline evaluation

🔗 Links

Model: nvhuynh16/gemma-2b-code-alpaca
Base Model: google/gemma-2-2b-it
Dataset: CodeAlpaca-20k
GitHub: Project Repository
Portfolio: Nam Huynh

⚠️ Limitations

Primarily trained on Python code
May generate verbose explanations alongside code
Best for simple-to-moderate complexity functions
Not suitable for production without human review
Limited to patterns seen in training data

📄 License

This model is based on Gemma-2B-it and inherits its license. The fine-tuning adapters and this demo are provided for educational and demonstration purposes.

🙏 Acknowledgments

Google: For the Gemma model family
Sahil Chaudhary: For the CodeAlpaca dataset
HuggingFace: For Transformers, PEFT, and inference infrastructure
Colab: For free GPU access

Built for portfolio demonstration • Targeting AI/ML Applied Scientist roles • Relevant to SAP ABAP Foundation Model team

This demo uses HuggingFace Inference API for serverless, cost-free inference