Spaces:

nvhuynh16
/

gemma-code-generator

Sleeping

App Files Files Community

gemma-code-generator / README.md

nvhuynh16

Upload 3 files

8f637d6 verified 3 months ago

preview code

raw

history blame contribute delete

6.31 kB

	---
	title: Gemma Code Generator
	emoji: 🤖
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 4.44.0
	app_file: app.py
	pinned: false
	license: gemma
	tags:
	- code-generation
	- gemma
	- fine-tuned
	- python
	- qlora
	models:
	- nvhuynh16/gemma-2b-code-alpaca
	---

	# 🤖 Gemma Code Generator

	Fine-tuned Gemma-2B model for Python code generation using QLoRA (Quantized Low-Rank Adaptation).

	## 🎯 Project Overview

	This demo showcases a fine-tuned Gemma-2B model trained on the CodeAlpaca dataset to generate Python code from natural language descriptions.

	### Key Features

	- ⚡ Fast Training: 4-6 hours on free Google Colab T4 GPU
	- 💰 Cost: $0 (using free Colab tier)
	- 📊 Performance: Expected 75-85% syntax correctness (vs 61% baseline)
	- 🔧 Method: QLoRA (4-bit quantization + LoRA adapters)
	- 📦 Efficient: Only 0.12% of parameters trained (3.2M / 2.6B)

	## 📈 Model Performance

	\| Metric \| Baseline (Pretrained) \| Fine-Tuned (Expected) \| Improvement \|
	\|--------\|----------------------\|----------------------\|-------------\|
	\| Syntax Correctness \| 61.0% \| 75-85% \| +14-24% \|
	\| BLEU Score \| 16.10 \| 25-35 \| +9-19 \|
	\| Trainable Parameters \| N/A \| 0.12% \| 100x fewer \|

	## 🛠️ Technical Details

	- Base Model: `google/gemma-2-2b-it` (2.5B parameters)
	- Dataset: CodeAlpaca-20k (3,600 training examples, 20% subset)
	- Fine-tuning Method: QLoRA
	- LoRA rank (r): 16
	- LoRA alpha: 32
	- Quantization: 4-bit NF4
	- Target modules: q_proj, v_proj
	- Training:
	- Epochs: 2
	- Batch size: 8 (2 per device × 4 accumulation)
	- Learning rate: 2e-4
	- Optimizer: paged_adamw_8bit
	- GPU: T4 (15GB VRAM, used ~4GB)
	- Framework: PyTorch + HuggingFace Transformers + PEFT

	## 💻 Usage

	### Quick Demo

	Try the live demo above! Just enter a code instruction like:
	- "Write a function to check if a number is prime"
	- "Create a function to reverse a string"
	- "Implement binary search on a sorted list"

	### Python Code

	```python
	from huggingface_hub import InferenceClient

	client = InferenceClient()

	prompt = """### Instruction:
	Write a function to check if a number is prime

	### Input:


	### Response:
	"""

	response = client.text_generation(
	"nvhuynh16/gemma-2b-code-alpaca",
	prompt=prompt,
	max_new_tokens=256,
	temperature=0.7,
	)

	print(response)
	```

	### Load Model Directly (Requires GPU + bitsandbytes)

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
	from peft import PeftModel
	import torch

	# Load base model with 4-bit quantization
	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.bfloat16,
	)

	base_model = AutoModelForCausalLM.from_pretrained(
	"google/gemma-2-2b-it",
	quantization_config=bnb_config,
	device_map="auto",
	)

	tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")

	# Load fine-tuned adapters
	model = PeftModel.from_pretrained(base_model, "nvhuynh16/gemma-2b-code-alpaca")

	# Generate code
	prompt = """### Instruction:
	Write a function to check if a number is prime

	### Input:


	### Response:
	"""

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=256)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## 🎓 Use Cases

	- Learning Programming: Get code examples for educational purposes
	- Prototyping: Quickly generate boilerplate code
	- Interview Preparation: Practice coding questions
	- Code Completion: Assistance for simple functions
	- Algorithm Reference: Implementation examples

	## 🚀 Training Methodology

	### Dataset Preparation
	1. Loaded CodeAlpaca-20k dataset
	2. Filtered invalid examples
	3. Formatted in Alpaca instruction style
	4. Split: 90% train, 5% validation, 5% test
	5. Used 20% subset (3,600 examples) for memory efficiency

	### Fine-Tuning Process
	1. Loaded Gemma-2B with 4-bit quantization (reduced VRAM from 10GB → 4GB)
	2. Applied LoRA adapters to attention layers only
	3. Trained for 2 epochs (~900 steps)
	4. Automatic checkpoint upload to HuggingFace Hub
	5. Total training time: 4-6 hours on free Colab T4

	### Memory Optimizations
	- 4-bit quantization (BitsAndBytes NF4)
	- LoRA adapters (0.12% trainable parameters)
	- Gradient checkpointing
	- 8-bit AdamW optimizer
	- Reduced sequence length (256 tokens)
	- Reduced batch size (2 per device)

	## 📁 Repository Structure

	```
	├── notebooks/
	│ ├── 02_fine_tuning_with_eval.ipynb # Complete training + evaluation
	│ └── 03_merge_adapters.ipynb # Merge adapters (optional)
	├── spaces/
	│ ├── app.py # This Gradio demo
	│ ├── requirements.txt # Dependencies
	│ └── README.md # This file
	├── scripts/
	│ ├── colab_quick_eval.py # Evaluation script
	│ └── train_local.py # Local training
	└── results/
	└── baseline_100.json # Baseline evaluation
	```

	## 🔗 Links

	- Model: [nvhuynh16/gemma-2b-code-alpaca](https://huggingface.co/nvhuynh16/gemma-2b-code-alpaca)
	- Base Model: [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it)
	- Dataset: [CodeAlpaca-20k](https://github.com/sahil280114/codealpaca)
	- GitHub: [Project Repository](#)
	- Portfolio: [Nam Huynh](#)

	## ⚠️ Limitations

	- Primarily trained on Python code
	- May generate verbose explanations alongside code
	- Best for simple-to-moderate complexity functions
	- Not suitable for production without human review
	- Limited to patterns seen in training data

	## 📄 License

	This model is based on Gemma-2B-it and inherits its license. The fine-tuning adapters and this demo are provided for educational and demonstration purposes.

	## 🙏 Acknowledgments

	- Google: For the Gemma model family
	- Sahil Chaudhary: For the CodeAlpaca dataset
	- HuggingFace: For Transformers, PEFT, and inference infrastructure
	- Colab: For free GPU access

	---

	Built for portfolio demonstration • Targeting AI/ML Applied Scientist roles • Relevant to SAP ABAP Foundation Model team

	This demo uses HuggingFace Inference API for serverless, cost-free inference