--- title: Gemma Code Generator emoji: 🤖 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 4.44.0 app_file: app.py pinned: false license: gemma tags: - code-generation - gemma - fine-tuned - python - qlora models: - nvhuynh16/gemma-2b-code-alpaca --- # 🤖 Gemma Code Generator Fine-tuned Gemma-2B model for Python code generation using QLoRA (Quantized Low-Rank Adaptation). ## 🎯 Project Overview This demo showcases a fine-tuned Gemma-2B model trained on the CodeAlpaca dataset to generate Python code from natural language descriptions. ### Key Features - ⚡ **Fast Training**: 4-6 hours on free Google Colab T4 GPU - 💰 **Cost**: $0 (using free Colab tier) - 📊 **Performance**: Expected 75-85% syntax correctness (vs 61% baseline) - 🔧 **Method**: QLoRA (4-bit quantization + LoRA adapters) - 📦 **Efficient**: Only 0.12% of parameters trained (3.2M / 2.6B) ## 📈 Model Performance | Metric | Baseline (Pretrained) | Fine-Tuned (Expected) | Improvement | |--------|----------------------|----------------------|-------------| | **Syntax Correctness** | 61.0% | 75-85% | +14-24% | | **BLEU Score** | 16.10 | 25-35 | +9-19 | | **Trainable Parameters** | N/A | 0.12% | 100x fewer | ## 🛠️ Technical Details - **Base Model**: `google/gemma-2-2b-it` (2.5B parameters) - **Dataset**: CodeAlpaca-20k (3,600 training examples, 20% subset) - **Fine-tuning Method**: QLoRA - LoRA rank (r): 16 - LoRA alpha: 32 - Quantization: 4-bit NF4 - Target modules: q_proj, v_proj - **Training**: - Epochs: 2 - Batch size: 8 (2 per device × 4 accumulation) - Learning rate: 2e-4 - Optimizer: paged_adamw_8bit - GPU: T4 (15GB VRAM, used ~4GB) - **Framework**: PyTorch + HuggingFace Transformers + PEFT ## 💻 Usage ### Quick Demo Try the live demo above! Just enter a code instruction like: - "Write a function to check if a number is prime" - "Create a function to reverse a string" - "Implement binary search on a sorted list" ### Python Code ```python from huggingface_hub import InferenceClient client = InferenceClient() prompt = """### Instruction: Write a function to check if a number is prime ### Input: ### Response: """ response = client.text_generation( "nvhuynh16/gemma-2b-code-alpaca", prompt=prompt, max_new_tokens=256, temperature=0.7, ) print(response) ``` ### Load Model Directly (Requires GPU + bitsandbytes) ```python from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig from peft import PeftModel import torch # Load base model with 4-bit quantization bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16, ) base_model = AutoModelForCausalLM.from_pretrained( "google/gemma-2-2b-it", quantization_config=bnb_config, device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it") # Load fine-tuned adapters model = PeftModel.from_pretrained(base_model, "nvhuynh16/gemma-2b-code-alpaca") # Generate code prompt = """### Instruction: Write a function to check if a number is prime ### Input: ### Response: """ inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=256) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## 🎓 Use Cases - **Learning Programming**: Get code examples for educational purposes - **Prototyping**: Quickly generate boilerplate code - **Interview Preparation**: Practice coding questions - **Code Completion**: Assistance for simple functions - **Algorithm Reference**: Implementation examples ## 🚀 Training Methodology ### Dataset Preparation 1. Loaded CodeAlpaca-20k dataset 2. Filtered invalid examples 3. Formatted in Alpaca instruction style 4. Split: 90% train, 5% validation, 5% test 5. Used 20% subset (3,600 examples) for memory efficiency ### Fine-Tuning Process 1. Loaded Gemma-2B with 4-bit quantization (reduced VRAM from 10GB → 4GB) 2. Applied LoRA adapters to attention layers only 3. Trained for 2 epochs (~900 steps) 4. Automatic checkpoint upload to HuggingFace Hub 5. Total training time: 4-6 hours on free Colab T4 ### Memory Optimizations - 4-bit quantization (BitsAndBytes NF4) - LoRA adapters (0.12% trainable parameters) - Gradient checkpointing - 8-bit AdamW optimizer - Reduced sequence length (256 tokens) - Reduced batch size (2 per device) ## 📁 Repository Structure ``` ├── notebooks/ │ ├── 02_fine_tuning_with_eval.ipynb # Complete training + evaluation │ └── 03_merge_adapters.ipynb # Merge adapters (optional) ├── spaces/ │ ├── app.py # This Gradio demo │ ├── requirements.txt # Dependencies │ └── README.md # This file ├── scripts/ │ ├── colab_quick_eval.py # Evaluation script │ └── train_local.py # Local training └── results/ └── baseline_100.json # Baseline evaluation ``` ## 🔗 Links - **Model**: [nvhuynh16/gemma-2b-code-alpaca](https://huggingface.co/nvhuynh16/gemma-2b-code-alpaca) - **Base Model**: [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) - **Dataset**: [CodeAlpaca-20k](https://github.com/sahil280114/codealpaca) - **GitHub**: [Project Repository](#) - **Portfolio**: [Nam Huynh](#) ## ⚠️ Limitations - Primarily trained on Python code - May generate verbose explanations alongside code - Best for simple-to-moderate complexity functions - Not suitable for production without human review - Limited to patterns seen in training data ## 📄 License This model is based on Gemma-2B-it and inherits its license. The fine-tuning adapters and this demo are provided for educational and demonstration purposes. ## 🙏 Acknowledgments - **Google**: For the Gemma model family - **Sahil Chaudhary**: For the CodeAlpaca dataset - **HuggingFace**: For Transformers, PEFT, and inference infrastructure - **Colab**: For free GPU access --- **Built for portfolio demonstration** • Targeting AI/ML Applied Scientist roles • Relevant to SAP ABAP Foundation Model team *This demo uses HuggingFace Inference API for serverless, cost-free inference*