nvhuynh16 commited on
Commit
8f637d6
·
verified ·
1 Parent(s): c0cb59f

Upload 3 files

Browse files
Files changed (3) hide show
  1. README.md +209 -5
  2. app.py +183 -0
  3. requirements.txt +2 -0
README.md CHANGED
@@ -1,13 +1,217 @@
1
  ---
2
  title: Gemma Code Generator
3
- emoji: 🌖
4
- colorFrom: purple
5
- colorTo: yellow
6
  sdk: gradio
7
- sdk_version: 5.49.1
8
  app_file: app.py
9
  pinned: false
10
  license: gemma
 
 
 
 
 
 
 
 
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: Gemma Code Generator
3
+ emoji: 🤖
4
+ colorFrom: blue
5
+ colorTo: purple
6
  sdk: gradio
7
+ sdk_version: 4.44.0
8
  app_file: app.py
9
  pinned: false
10
  license: gemma
11
+ tags:
12
+ - code-generation
13
+ - gemma
14
+ - fine-tuned
15
+ - python
16
+ - qlora
17
+ models:
18
+ - nvhuynh16/gemma-2b-code-alpaca
19
  ---
20
 
21
+ # 🤖 Gemma Code Generator
22
+
23
+ Fine-tuned Gemma-2B model for Python code generation using QLoRA (Quantized Low-Rank Adaptation).
24
+
25
+ ## 🎯 Project Overview
26
+
27
+ This demo showcases a fine-tuned Gemma-2B model trained on the CodeAlpaca dataset to generate Python code from natural language descriptions.
28
+
29
+ ### Key Features
30
+
31
+ - ⚡ **Fast Training**: 4-6 hours on free Google Colab T4 GPU
32
+ - 💰 **Cost**: $0 (using free Colab tier)
33
+ - 📊 **Performance**: Expected 75-85% syntax correctness (vs 61% baseline)
34
+ - 🔧 **Method**: QLoRA (4-bit quantization + LoRA adapters)
35
+ - 📦 **Efficient**: Only 0.12% of parameters trained (3.2M / 2.6B)
36
+
37
+ ## 📈 Model Performance
38
+
39
+ | Metric | Baseline (Pretrained) | Fine-Tuned (Expected) | Improvement |
40
+ |--------|----------------------|----------------------|-------------|
41
+ | **Syntax Correctness** | 61.0% | 75-85% | +14-24% |
42
+ | **BLEU Score** | 16.10 | 25-35 | +9-19 |
43
+ | **Trainable Parameters** | N/A | 0.12% | 100x fewer |
44
+
45
+ ## 🛠️ Technical Details
46
+
47
+ - **Base Model**: `google/gemma-2-2b-it` (2.5B parameters)
48
+ - **Dataset**: CodeAlpaca-20k (3,600 training examples, 20% subset)
49
+ - **Fine-tuning Method**: QLoRA
50
+ - LoRA rank (r): 16
51
+ - LoRA alpha: 32
52
+ - Quantization: 4-bit NF4
53
+ - Target modules: q_proj, v_proj
54
+ - **Training**:
55
+ - Epochs: 2
56
+ - Batch size: 8 (2 per device × 4 accumulation)
57
+ - Learning rate: 2e-4
58
+ - Optimizer: paged_adamw_8bit
59
+ - GPU: T4 (15GB VRAM, used ~4GB)
60
+ - **Framework**: PyTorch + HuggingFace Transformers + PEFT
61
+
62
+ ## 💻 Usage
63
+
64
+ ### Quick Demo
65
+
66
+ Try the live demo above! Just enter a code instruction like:
67
+ - "Write a function to check if a number is prime"
68
+ - "Create a function to reverse a string"
69
+ - "Implement binary search on a sorted list"
70
+
71
+ ### Python Code
72
+
73
+ ```python
74
+ from huggingface_hub import InferenceClient
75
+
76
+ client = InferenceClient()
77
+
78
+ prompt = """### Instruction:
79
+ Write a function to check if a number is prime
80
+
81
+ ### Input:
82
+
83
+
84
+ ### Response:
85
+ """
86
+
87
+ response = client.text_generation(
88
+ "nvhuynh16/gemma-2b-code-alpaca",
89
+ prompt=prompt,
90
+ max_new_tokens=256,
91
+ temperature=0.7,
92
+ )
93
+
94
+ print(response)
95
+ ```
96
+
97
+ ### Load Model Directly (Requires GPU + bitsandbytes)
98
+
99
+ ```python
100
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
101
+ from peft import PeftModel
102
+ import torch
103
+
104
+ # Load base model with 4-bit quantization
105
+ bnb_config = BitsAndBytesConfig(
106
+ load_in_4bit=True,
107
+ bnb_4bit_quant_type="nf4",
108
+ bnb_4bit_compute_dtype=torch.bfloat16,
109
+ )
110
+
111
+ base_model = AutoModelForCausalLM.from_pretrained(
112
+ "google/gemma-2-2b-it",
113
+ quantization_config=bnb_config,
114
+ device_map="auto",
115
+ )
116
+
117
+ tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")
118
+
119
+ # Load fine-tuned adapters
120
+ model = PeftModel.from_pretrained(base_model, "nvhuynh16/gemma-2b-code-alpaca")
121
+
122
+ # Generate code
123
+ prompt = """### Instruction:
124
+ Write a function to check if a number is prime
125
+
126
+ ### Input:
127
+
128
+
129
+ ### Response:
130
+ """
131
+
132
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
133
+ outputs = model.generate(**inputs, max_new_tokens=256)
134
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
135
+ ```
136
+
137
+ ## 🎓 Use Cases
138
+
139
+ - **Learning Programming**: Get code examples for educational purposes
140
+ - **Prototyping**: Quickly generate boilerplate code
141
+ - **Interview Preparation**: Practice coding questions
142
+ - **Code Completion**: Assistance for simple functions
143
+ - **Algorithm Reference**: Implementation examples
144
+
145
+ ## 🚀 Training Methodology
146
+
147
+ ### Dataset Preparation
148
+ 1. Loaded CodeAlpaca-20k dataset
149
+ 2. Filtered invalid examples
150
+ 3. Formatted in Alpaca instruction style
151
+ 4. Split: 90% train, 5% validation, 5% test
152
+ 5. Used 20% subset (3,600 examples) for memory efficiency
153
+
154
+ ### Fine-Tuning Process
155
+ 1. Loaded Gemma-2B with 4-bit quantization (reduced VRAM from 10GB → 4GB)
156
+ 2. Applied LoRA adapters to attention layers only
157
+ 3. Trained for 2 epochs (~900 steps)
158
+ 4. Automatic checkpoint upload to HuggingFace Hub
159
+ 5. Total training time: 4-6 hours on free Colab T4
160
+
161
+ ### Memory Optimizations
162
+ - 4-bit quantization (BitsAndBytes NF4)
163
+ - LoRA adapters (0.12% trainable parameters)
164
+ - Gradient checkpointing
165
+ - 8-bit AdamW optimizer
166
+ - Reduced sequence length (256 tokens)
167
+ - Reduced batch size (2 per device)
168
+
169
+ ## 📁 Repository Structure
170
+
171
+ ```
172
+ ├── notebooks/
173
+ │ ├── 02_fine_tuning_with_eval.ipynb # Complete training + evaluation
174
+ │ └── 03_merge_adapters.ipynb # Merge adapters (optional)
175
+ ├── spaces/
176
+ │ ├── app.py # This Gradio demo
177
+ │ ├── requirements.txt # Dependencies
178
+ │ └── README.md # This file
179
+ ├── scripts/
180
+ │ ├── colab_quick_eval.py # Evaluation script
181
+ │ └── train_local.py # Local training
182
+ └── results/
183
+ └── baseline_100.json # Baseline evaluation
184
+ ```
185
+
186
+ ## 🔗 Links
187
+
188
+ - **Model**: [nvhuynh16/gemma-2b-code-alpaca](https://huggingface.co/nvhuynh16/gemma-2b-code-alpaca)
189
+ - **Base Model**: [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it)
190
+ - **Dataset**: [CodeAlpaca-20k](https://github.com/sahil280114/codealpaca)
191
+ - **GitHub**: [Project Repository](#)
192
+ - **Portfolio**: [Nam Huynh](#)
193
+
194
+ ## ⚠️ Limitations
195
+
196
+ - Primarily trained on Python code
197
+ - May generate verbose explanations alongside code
198
+ - Best for simple-to-moderate complexity functions
199
+ - Not suitable for production without human review
200
+ - Limited to patterns seen in training data
201
+
202
+ ## 📄 License
203
+
204
+ This model is based on Gemma-2B-it and inherits its license. The fine-tuning adapters and this demo are provided for educational and demonstration purposes.
205
+
206
+ ## 🙏 Acknowledgments
207
+
208
+ - **Google**: For the Gemma model family
209
+ - **Sahil Chaudhary**: For the CodeAlpaca dataset
210
+ - **HuggingFace**: For Transformers, PEFT, and inference infrastructure
211
+ - **Colab**: For free GPU access
212
+
213
+ ---
214
+
215
+ **Built for portfolio demonstration** • Targeting AI/ML Applied Scientist roles • Relevant to SAP ABAP Foundation Model team
216
+
217
+ *This demo uses HuggingFace Inference API for serverless, cost-free inference*
app.py ADDED
@@ -0,0 +1,183 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Gradio demo for Gemma Code Generator using HuggingFace Inference API.
3
+ This runs serverless on HF infrastructure - no GPU costs!
4
+ """
5
+
6
+ import gradio as gr
7
+ from huggingface_hub import InferenceClient
8
+
9
+ # Model configuration
10
+ MODEL_NAME = "nvhuynh16/gemma-2b-code-alpaca"
11
+
12
+ # Initialize Inference client with explicit endpoint
13
+ client = InferenceClient(
14
+ model=MODEL_NAME,
15
+ token=None, # Uses public inference API
16
+ )
17
+
18
+
19
+ def generate_code(instruction: str, max_tokens: int = 256, temperature: float = 0.7):
20
+ """Generate code from instruction using HF Inference API"""
21
+
22
+ if not instruction.strip():
23
+ return "Please enter an instruction."
24
+
25
+ # Format prompt in Alpaca style
26
+ prompt = f"""### Instruction:
27
+ {instruction}
28
+
29
+ ### Input:
30
+
31
+
32
+ ### Response:
33
+ """
34
+
35
+ try:
36
+ # Generate using HF Inference API
37
+ response = client.text_generation(
38
+ prompt,
39
+ max_new_tokens=max_tokens,
40
+ temperature=temperature,
41
+ top_p=0.9,
42
+ do_sample=True,
43
+ return_full_text=False,
44
+ )
45
+
46
+ return response.strip()
47
+
48
+ except Exception as e:
49
+ error_msg = str(e)
50
+ if "Model too large" in error_msg or "not currently loaded" in error_msg or "loading" in error_msg.lower():
51
+ return "⏳ Model is loading (first request takes 1-2 minutes). Please try again in a moment."
52
+ elif "rate limit" in error_msg.lower():
53
+ return "⚠️ Rate limit reached. Please wait a few minutes and try again."
54
+ else:
55
+ return f"Error: {error_msg}\n\nPlease try again. If the issue persists, the model may be loading for the first time."
56
+
57
+
58
+ # Custom CSS for better appearance
59
+ custom_css = """
60
+ .container {
61
+ max-width: 900px;
62
+ margin: auto;
63
+ }
64
+ .output-code {
65
+ font-family: 'Courier New', monospace;
66
+ font-size: 14px;
67
+ }
68
+ """
69
+
70
+ # Create Gradio interface
71
+ with gr.Blocks(theme=gr.themes.Soft(), css=custom_css) as demo:
72
+
73
+ gr.Markdown(
74
+ """
75
+ # 🤖 Gemma Code Generator
76
+
77
+ Fine-tuned Gemma-2B model for Python code generation using QLoRA.
78
+
79
+ **Performance**: Expected 75-85% syntax correctness (vs 61% baseline) | BLEU Score: 25-35 (vs 16.10 baseline)
80
+
81
+ **Note**: First request may take 1-2 minutes as the model loads on HuggingFace servers. Subsequent requests are instant!
82
+ """
83
+ )
84
+
85
+ with gr.Row():
86
+ with gr.Column(scale=1):
87
+ instruction_input = gr.Textbox(
88
+ label="Code Instruction",
89
+ placeholder="Describe the function you want to create...",
90
+ lines=3,
91
+ )
92
+
93
+ with gr.Accordion("Advanced Settings", open=False):
94
+ max_tokens_slider = gr.Slider(
95
+ minimum=64,
96
+ maximum=512,
97
+ value=256,
98
+ step=64,
99
+ label="Max Tokens",
100
+ info="Maximum length of generated code"
101
+ )
102
+
103
+ temperature_slider = gr.Slider(
104
+ minimum=0.1,
105
+ maximum=1.5,
106
+ value=0.7,
107
+ step=0.1,
108
+ label="Temperature",
109
+ info="Higher = more creative, Lower = more deterministic"
110
+ )
111
+
112
+ generate_btn = gr.Button("Generate Code", variant="primary", size="lg")
113
+
114
+ with gr.Column(scale=1):
115
+ output_code = gr.Code(
116
+ label="Generated Code",
117
+ language="python",
118
+ elem_classes="output-code"
119
+ )
120
+
121
+ # Examples
122
+ gr.Examples(
123
+ examples=[
124
+ ["Write a function to check if a number is prime"],
125
+ ["Create a function to reverse a string"],
126
+ ["Write a function to find the factorial of a number"],
127
+ ["Implement binary search on a sorted list"],
128
+ ["Create a function to merge two sorted lists"],
129
+ ["Write a function to calculate Fibonacci numbers"],
130
+ ["Implement a function to find the longest common subsequence"],
131
+ ["Create a function to validate an email address using regex"],
132
+ ["Write a function to convert a decimal number to binary"],
133
+ ["Implement a simple LRU cache using OrderedDict"],
134
+ ],
135
+ inputs=[instruction_input],
136
+ label="Example Prompts (Click to use)"
137
+ )
138
+
139
+ # Event handler
140
+ generate_btn.click(
141
+ fn=generate_code,
142
+ inputs=[instruction_input, max_tokens_slider, temperature_slider],
143
+ outputs=[output_code],
144
+ )
145
+
146
+ # Model information footer
147
+ gr.Markdown(
148
+ """
149
+ ---
150
+
151
+ ### 📊 Model Performance
152
+
153
+ | Metric | Baseline (Pretrained) | Fine-Tuned (Expected) | Improvement |
154
+ |--------|----------------------|----------------------|-------------|
155
+ | **Syntax Correctness** | 61.0% | 75-85% | +14-24% |
156
+ | **BLEU Score** | 16.10 | 25-35 | +9-19 |
157
+ | **Trainable Parameters** | 2.5B | 3.2M (0.12%) | 100x fewer |
158
+
159
+ ### 🛠️ Technical Details
160
+
161
+ - **Base Model**: google/gemma-2-2b-it (2.5B parameters)
162
+ - **Fine-tuning**: QLoRA (4-bit quantization + LoRA rank 16)
163
+ - **Dataset**: CodeAlpaca-20k (3,600 training examples)
164
+ - **Training**: 4-6 hours on free Google Colab T4 GPU
165
+ - **Cost**: $0 (free Colab + free HF Spaces hosting)
166
+
167
+ ### 🔗 Links
168
+
169
+ [Model on HuggingFace](https://huggingface.co/nvhuynh16/gemma-2b-code-alpaca) •
170
+ [GitHub Repository](https://github.com/YOUR-USERNAME/YOUR-REPO) •
171
+ [Portfolio](https://YOUR-PORTFOLIO-SITE.com) •
172
+ [Base Model](https://huggingface.co/google/gemma-2-2b-it)
173
+
174
+ ---
175
+
176
+ **Built for portfolio demonstration** • Targeting AI/ML Applied Scientist roles
177
+
178
+ *This demo uses HuggingFace Inference API for serverless, cost-free inference*
179
+ """
180
+ )
181
+
182
+ if __name__ == "__main__":
183
+ demo.launch()
requirements.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ gradio==4.44.0
2
+ huggingface-hub>=0.26.0