Prithvik-1 commited on
Commit
a200e30
Β·
verified Β·
1 Parent(s): e465de3

Upload HYPERPARAMETER_TUNING_GUIDE.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. HYPERPARAMETER_TUNING_GUIDE.md +134 -0
HYPERPARAMETER_TUNING_GUIDE.md ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🎯 Hyperparameter Tuning Guide for Better Code Generation
2
+
3
+ **Issue:** Model generating repetitive text notes instead of Verilog code
4
+ **Solution:** Adjust inference hyperparameters and fix prompt format
5
+
6
+ ---
7
+
8
+ ## πŸ”§ Key Issues Identified
9
+
10
+ 1. **Prompt Format Mismatch**: Inference format didn't match training format exactly
11
+ 2. **Repetition Penalty Too Low**: Model was repeating "Note:" statements
12
+ 3. **Temperature May Be Too High**: Causing non-deterministic outputs
13
+ 4. **Response Extraction**: Need to properly extract only newly generated tokens
14
+
15
+ ---
16
+
17
+ ## βœ… Fixes Applied
18
+
19
+ ### 1. **Prompt Format Fixed**
20
+ - **Training Format:** `instruction + EOS + response + EOS`
21
+ - **Inference Format (Now):** `instruction + EOS` (model continues from here)
22
+ - **Change:** Added EOS token at end of prompt to match training
23
+
24
+ ### 2. **Repetition Penalty Increased**
25
+ - **Before:** `repetition_penalty=1.1`
26
+ - **After:** `repetition_penalty=1.2`
27
+ - **Reason:** Prevents repetitive "Note:" statements
28
+
29
+ ### 3. **Response Decoding Fixed**
30
+ - **Before:** Decoding entire output including prompt
31
+ - **After:** Decoding only newly generated tokens (after prompt)
32
+ - **Benefit:** Cleaner output, no prompt contamination
33
+
34
+ ---
35
+
36
+ ## πŸŽ›οΈ Recommended Hyperparameter Changes
37
+
38
+ ### For Better Code Generation:
39
+
40
+ | Parameter | Current | Recommended | Reason |
41
+ |-----------|---------|-------------|--------|
42
+ | **Temperature** | 0.3 | **0.1-0.2** | Lower = more deterministic, better for code |
43
+ | **Repetition Penalty** | 1.1 | **1.2-1.3** | Prevents repetitive text generation |
44
+ | **Max New Tokens** | 800 | **1000-1200** | Ensures complete code generation |
45
+ | **Top-p** | 0.9 | **0.95** | Slightly more diverse (if temperature > 0) |
46
+
47
+ ### Optimal Settings for Code Generation:
48
+
49
+ ```python
50
+ temperature = 0.1 # Very deterministic (best for exact code match)
51
+ repetition_penalty = 1.2 # Prevent repetition
52
+ max_new_tokens = 1000 # Ensure complete code
53
+ top_p = 0.95 # If using sampling
54
+ ```
55
+
56
+ ---
57
+
58
+ ## πŸš€ Test Command (Updated)
59
+
60
+ ```bash
61
+ cd /workspace/ftt/codellama-migration
62
+ source /venv/main/bin/activate
63
+ python3 test_single_training_sample.py
64
+ ```
65
+
66
+ This script tests with multiple temperatures (0.1, 0.2, 0.3) so you can see which works best.
67
+
68
+ ---
69
+
70
+ ## πŸ“ Quick Test Command (Single Sample)
71
+
72
+ ```bash
73
+ cd /workspace/ftt/codellama-migration
74
+ source /venv/main/bin/activate
75
+
76
+ # Extract first training sample and test
77
+ INSTRUCTION=$(sed -n '1p' datasets/processed/split/train.jsonl | python3 -c "import sys, json; print(json.load(sys.stdin)['instruction'])")
78
+
79
+ python3 scripts/inference/inference_codellama.py \
80
+ --mode local \
81
+ --model-path training-outputs/codellama-fifo-v1 \
82
+ --prompt "$INSTRUCTION" \
83
+ --max-new-tokens 1000 \
84
+ --temperature 0.1
85
+ ```
86
+
87
+ ---
88
+
89
+ ## πŸ” Why Temperature 0.1 Instead of 0.3?
90
+
91
+ - **0.1**: Very deterministic, picks most likely token β†’ Better code accuracy
92
+ - **0.3**: More variation, creative β†’ May generate text instead of code
93
+ - **0.5+**: High variation β†’ Not suitable for code generation
94
+
95
+ **For exact code matching with training data: Use 0.1**
96
+
97
+ ---
98
+
99
+ ## πŸ”„ Why Repetition Penalty 1.2?
100
+
101
+ - **1.0**: No penalty β†’ Model repeats patterns
102
+ - **1.1**: Low penalty β†’ Still gets repetitive
103
+ - **1.2-1.3**: Good balance β†’ Prevents repetition without hurting quality
104
+ - **1.5+**: Too high β†’ May suppress valid repetitions in code
105
+
106
+ ---
107
+
108
+ ## βœ… Summary of Changes
109
+
110
+ 1. βœ… **Fixed prompt format** - Matches training format (instruction + EOS)
111
+ 2. βœ… **Increased repetition_penalty** - 1.1 β†’ 1.2
112
+ 3. βœ… **Fixed response extraction** - Only decode newly generated tokens
113
+ 4. βœ… **Lower temperature recommended** - 0.3 β†’ 0.1 for exact matches
114
+
115
+ ---
116
+
117
+ ## πŸ§ͺ Testing
118
+
119
+ Run the test script to see improvements:
120
+
121
+ ```bash
122
+ python3 test_single_training_sample.py
123
+ ```
124
+
125
+ This will test with temperatures 0.1, 0.2, and 0.3 so you can compare outputs.
126
+
127
+ ---
128
+
129
+ **Next Steps:**
130
+ 1. Test with updated inference script
131
+ 2. Compare outputs with different temperatures
132
+ 3. Choose optimal temperature for your use case
133
+ 4. If issues persist, may need to retrain with better dataset format
134
+