Spaces:

nvhuynh16
/

gemma-code-generator

Sleeping

App Files Files Community

nvhuynh16 commited on Nov 15, 2025

Commit

8f637d6

verified ·

1 Parent(s): c0cb59f

Upload 3 files

Browse files

Files changed (3) hide show

README.md +209 -5
app.py +183 -0
requirements.txt +2 -0

README.md CHANGED Viewed

@@ -1,13 +1,217 @@
 ---
 title: Gemma Code Generator
-emoji: 🌖
-colorFrom: purple
-colorTo: yellow
 sdk: gradio
-sdk_version: 5.49.1
 app_file: app.py
 pinned: false
 license: gemma
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: Gemma Code Generator
+emoji: 🤖
+colorFrom: blue
+colorTo: purple
 sdk: gradio
+sdk_version: 4.44.0
 app_file: app.py
 pinned: false
 license: gemma
+tags:
+  - code-generation
+  - gemma
+  - fine-tuned
+  - python
+  - qlora
+models:
+  - nvhuynh16/gemma-2b-code-alpaca
 ---
+# 🤖 Gemma Code Generator
+Fine-tuned Gemma-2B model for Python code generation using QLoRA (Quantized Low-Rank Adaptation).
+## 🎯 Project Overview
+This demo showcases a fine-tuned Gemma-2B model trained on the CodeAlpaca dataset to generate Python code from natural language descriptions.
+### Key Features
+- ⚡ **Fast Training**: 4-6 hours on free Google Colab T4 GPU
+- 💰 **Cost**: $0 (using free Colab tier)
+- 📊 **Performance**: Expected 75-85% syntax correctness (vs 61% baseline)
+- 🔧 **Method**: QLoRA (4-bit quantization + LoRA adapters)
+- 📦 **Efficient**: Only 0.12% of parameters trained (3.2M / 2.6B)
+## 📈 Model Performance
+| Metric | Baseline (Pretrained) | Fine-Tuned (Expected) | Improvement |
+|--------|----------------------|----------------------|-------------|
+| **Syntax Correctness** | 61.0% | 75-85% | +14-24% |
+| **BLEU Score** | 16.10 | 25-35 | +9-19 |
+| **Trainable Parameters** | N/A | 0.12% | 100x fewer |
+## 🛠️ Technical Details
+- **Base Model**: `google/gemma-2-2b-it` (2.5B parameters)
+- **Dataset**: CodeAlpaca-20k (3,600 training examples, 20% subset)
+- **Fine-tuning Method**: QLoRA
+  - LoRA rank (r): 16
+  - LoRA alpha: 32
+  - Quantization: 4-bit NF4
+  - Target modules: q_proj, v_proj
+- **Training**:
+  - Epochs: 2
+  - Batch size: 8 (2 per device × 4 accumulation)
+  - Learning rate: 2e-4
+  - Optimizer: paged_adamw_8bit
+  - GPU: T4 (15GB VRAM, used ~4GB)
+- **Framework**: PyTorch + HuggingFace Transformers + PEFT
+## 💻 Usage
+### Quick Demo
+Try the live demo above! Just enter a code instruction like:
+- "Write a function to check if a number is prime"
+- "Create a function to reverse a string"
+- "Implement binary search on a sorted list"
+### Python Code
+```python
+from huggingface_hub import InferenceClient
+client = InferenceClient()
+prompt = """### Instruction:
+Write a function to check if a number is prime
+### Input:
+### Response:
+"""
+response = client.text_generation(
+    "nvhuynh16/gemma-2b-code-alpaca",
+    prompt=prompt,
+    max_new_tokens=256,
+    temperature=0.7,
+)
+print(response)
+```
+### Load Model Directly (Requires GPU + bitsandbytes)
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
+from peft import PeftModel
+import torch
+# Load base model with 4-bit quantization
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.bfloat16,
+)
+base_model = AutoModelForCausalLM.from_pretrained(
+    "google/gemma-2-2b-it",
+    quantization_config=bnb_config,
+    device_map="auto",
+)
+tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")
+# Load fine-tuned adapters
+model = PeftModel.from_pretrained(base_model, "nvhuynh16/gemma-2b-code-alpaca")
+# Generate code
+prompt = """### Instruction:
+Write a function to check if a number is prime
+### Input:
+### Response:
+"""
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=256)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+## 🎓 Use Cases
+- **Learning Programming**: Get code examples for educational purposes
+- **Prototyping**: Quickly generate boilerplate code
+- **Interview Preparation**: Practice coding questions
+- **Code Completion**: Assistance for simple functions
+- **Algorithm Reference**: Implementation examples
+## 🚀 Training Methodology
+### Dataset Preparation
+1. Loaded CodeAlpaca-20k dataset
+2. Filtered invalid examples
+3. Formatted in Alpaca instruction style
+4. Split: 90% train, 5% validation, 5% test
+5. Used 20% subset (3,600 examples) for memory efficiency
+### Fine-Tuning Process
+1. Loaded Gemma-2B with 4-bit quantization (reduced VRAM from 10GB → 4GB)
+2. Applied LoRA adapters to attention layers only
+3. Trained for 2 epochs (~900 steps)
+4. Automatic checkpoint upload to HuggingFace Hub
+5. Total training time: 4-6 hours on free Colab T4
+### Memory Optimizations
+- 4-bit quantization (BitsAndBytes NF4)
+- LoRA adapters (0.12% trainable parameters)
+- Gradient checkpointing
+- 8-bit AdamW optimizer
+- Reduced sequence length (256 tokens)
+- Reduced batch size (2 per device)
+## 📁 Repository Structure
+```
+├── notebooks/
+│   ├── 02_fine_tuning_with_eval.ipynb  # Complete training + evaluation
+│   └── 03_merge_adapters.ipynb         # Merge adapters (optional)
+├── spaces/
+│   ├── app.py                          # This Gradio demo
+│   ├── requirements.txt                # Dependencies
+│   └── README.md                       # This file
+├── scripts/
+│   ├── colab_quick_eval.py             # Evaluation script
+│   └── train_local.py                  # Local training
+└── results/
+    └── baseline_100.json               # Baseline evaluation
+```
+## 🔗 Links
+- **Model**: [nvhuynh16/gemma-2b-code-alpaca](https://huggingface.co/nvhuynh16/gemma-2b-code-alpaca)
+- **Base Model**: [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it)
+- **Dataset**: [CodeAlpaca-20k](https://github.com/sahil280114/codealpaca)
+- **GitHub**: [Project Repository](#)
+- **Portfolio**: [Nam Huynh](#)
+## ⚠️ Limitations
+- Primarily trained on Python code
+- May generate verbose explanations alongside code
+- Best for simple-to-moderate complexity functions
+- Not suitable for production without human review
+- Limited to patterns seen in training data
+## 📄 License
+This model is based on Gemma-2B-it and inherits its license. The fine-tuning adapters and this demo are provided for educational and demonstration purposes.
+## 🙏 Acknowledgments
+- **Google**: For the Gemma model family
+- **Sahil Chaudhary**: For the CodeAlpaca dataset
+- **HuggingFace**: For Transformers, PEFT, and inference infrastructure
+- **Colab**: For free GPU access
+---
+**Built for portfolio demonstration** • Targeting AI/ML Applied Scientist roles • Relevant to SAP ABAP Foundation Model team
+*This demo uses HuggingFace Inference API for serverless, cost-free inference*

app.py ADDED Viewed

	@@ -0,0 +1,183 @@

+"""
+Gradio demo for Gemma Code Generator using HuggingFace Inference API.
+This runs serverless on HF infrastructure - no GPU costs!
+"""
+import gradio as gr
+from huggingface_hub import InferenceClient
+# Model configuration
+MODEL_NAME = "nvhuynh16/gemma-2b-code-alpaca"
+# Initialize Inference client with explicit endpoint
+client = InferenceClient(
+    model=MODEL_NAME,
+    token=None,  # Uses public inference API
+)
+def generate_code(instruction: str, max_tokens: int = 256, temperature: float = 0.7):
+    """Generate code from instruction using HF Inference API"""
+    if not instruction.strip():
+        return "Please enter an instruction."
+    # Format prompt in Alpaca style
+    prompt = f"""### Instruction:
+{instruction}
+### Input:
+### Response:
+"""
+    try:
+        # Generate using HF Inference API
+        response = client.text_generation(
+            prompt,
+            max_new_tokens=max_tokens,
+            temperature=temperature,
+            top_p=0.9,
+            do_sample=True,
+            return_full_text=False,
+        )
+        return response.strip()
+    except Exception as e:
+        error_msg = str(e)
+        if "Model too large" in error_msg or "not currently loaded" in error_msg or "loading" in error_msg.lower():
+            return "⏳ Model is loading (first request takes 1-2 minutes). Please try again in a moment."
+        elif "rate limit" in error_msg.lower():
+            return "⚠️ Rate limit reached. Please wait a few minutes and try again."
+        else:
+            return f"Error: {error_msg}\n\nPlease try again. If the issue persists, the model may be loading for the first time."
+# Custom CSS for better appearance
+custom_css = """
+.container {
+    max-width: 900px;
+    margin: auto;
+}
+.output-code {
+    font-family: 'Courier New', monospace;
+    font-size: 14px;
+}
+"""
+# Create Gradio interface
+with gr.Blocks(theme=gr.themes.Soft(), css=custom_css) as demo:
+    gr.Markdown(
+        """
+        # 🤖 Gemma Code Generator
+        Fine-tuned Gemma-2B model for Python code generation using QLoRA.
+        **Performance**: Expected 75-85% syntax correctness (vs 61% baseline) | BLEU Score: 25-35 (vs 16.10 baseline)
+        **Note**: First request may take 1-2 minutes as the model loads on HuggingFace servers. Subsequent requests are instant!
+        """
+    )
+    with gr.Row():
+        with gr.Column(scale=1):
+            instruction_input = gr.Textbox(
+                label="Code Instruction",
+                placeholder="Describe the function you want to create...",
+                lines=3,
+            )
+            with gr.Accordion("Advanced Settings", open=False):
+                max_tokens_slider = gr.Slider(
+                    minimum=64,
+                    maximum=512,
+                    value=256,
+                    step=64,
+                    label="Max Tokens",
+                    info="Maximum length of generated code"
+                )
+                temperature_slider = gr.Slider(
+                    minimum=0.1,
+                    maximum=1.5,
+                    value=0.7,
+                    step=0.1,
+                    label="Temperature",
+                    info="Higher = more creative, Lower = more deterministic"
+                )
+            generate_btn = gr.Button("Generate Code", variant="primary", size="lg")
+        with gr.Column(scale=1):
+            output_code = gr.Code(
+                label="Generated Code",
+                language="python",
+                elem_classes="output-code"
+            )
+    # Examples
+    gr.Examples(
+        examples=[
+            ["Write a function to check if a number is prime"],
+            ["Create a function to reverse a string"],
+            ["Write a function to find the factorial of a number"],
+            ["Implement binary search on a sorted list"],
+            ["Create a function to merge two sorted lists"],
+            ["Write a function to calculate Fibonacci numbers"],
+            ["Implement a function to find the longest common subsequence"],
+            ["Create a function to validate an email address using regex"],
+            ["Write a function to convert a decimal number to binary"],
+            ["Implement a simple LRU cache using OrderedDict"],
+        ],
+        inputs=[instruction_input],
+        label="Example Prompts (Click to use)"
+    )
+    # Event handler
+    generate_btn.click(
+        fn=generate_code,
+        inputs=[instruction_input, max_tokens_slider, temperature_slider],
+        outputs=[output_code],
+    )
+    # Model information footer
+    gr.Markdown(
+        """
+        ---
+        ### 📊 Model Performance
+        | Metric | Baseline (Pretrained) | Fine-Tuned (Expected) | Improvement |
+        |--------|----------------------|----------------------|-------------|
+        | **Syntax Correctness** | 61.0% | 75-85% | +14-24% |
+        | **BLEU Score** | 16.10 | 25-35 | +9-19 |
+        | **Trainable Parameters** | 2.5B | 3.2M (0.12%) | 100x fewer |
+        ### 🛠️ Technical Details
+        - **Base Model**: google/gemma-2-2b-it (2.5B parameters)
+        - **Fine-tuning**: QLoRA (4-bit quantization + LoRA rank 16)
+        - **Dataset**: CodeAlpaca-20k (3,600 training examples)
+        - **Training**: 4-6 hours on free Google Colab T4 GPU
+        - **Cost**: $0 (free Colab + free HF Spaces hosting)
+        ### 🔗 Links
+        [Model on HuggingFace](https://huggingface.co/nvhuynh16/gemma-2b-code-alpaca) •
+        [GitHub Repository](https://github.com/YOUR-USERNAME/YOUR-REPO) •
+        [Portfolio](https://YOUR-PORTFOLIO-SITE.com) •
+        [Base Model](https://huggingface.co/google/gemma-2-2b-it)
+        ---
+        **Built for portfolio demonstration** • Targeting AI/ML Applied Scientist roles
+        *This demo uses HuggingFace Inference API for serverless, cost-free inference*
+        """
+    )
+if __name__ == "__main__":
+    demo.launch()

requirements.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ gradio==4.44.0
2	+ huggingface-hub>=0.26.0