Upload HYPERPARAMETER_TUNING_GUIDE.md with huggingface_hub
Browse files- HYPERPARAMETER_TUNING_GUIDE.md +134 -0
HYPERPARAMETER_TUNING_GUIDE.md
ADDED
|
@@ -0,0 +1,134 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# π― Hyperparameter Tuning Guide for Better Code Generation
|
| 2 |
+
|
| 3 |
+
**Issue:** Model generating repetitive text notes instead of Verilog code
|
| 4 |
+
**Solution:** Adjust inference hyperparameters and fix prompt format
|
| 5 |
+
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
## π§ Key Issues Identified
|
| 9 |
+
|
| 10 |
+
1. **Prompt Format Mismatch**: Inference format didn't match training format exactly
|
| 11 |
+
2. **Repetition Penalty Too Low**: Model was repeating "Note:" statements
|
| 12 |
+
3. **Temperature May Be Too High**: Causing non-deterministic outputs
|
| 13 |
+
4. **Response Extraction**: Need to properly extract only newly generated tokens
|
| 14 |
+
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
## β
Fixes Applied
|
| 18 |
+
|
| 19 |
+
### 1. **Prompt Format Fixed**
|
| 20 |
+
- **Training Format:** `instruction + EOS + response + EOS`
|
| 21 |
+
- **Inference Format (Now):** `instruction + EOS` (model continues from here)
|
| 22 |
+
- **Change:** Added EOS token at end of prompt to match training
|
| 23 |
+
|
| 24 |
+
### 2. **Repetition Penalty Increased**
|
| 25 |
+
- **Before:** `repetition_penalty=1.1`
|
| 26 |
+
- **After:** `repetition_penalty=1.2`
|
| 27 |
+
- **Reason:** Prevents repetitive "Note:" statements
|
| 28 |
+
|
| 29 |
+
### 3. **Response Decoding Fixed**
|
| 30 |
+
- **Before:** Decoding entire output including prompt
|
| 31 |
+
- **After:** Decoding only newly generated tokens (after prompt)
|
| 32 |
+
- **Benefit:** Cleaner output, no prompt contamination
|
| 33 |
+
|
| 34 |
+
---
|
| 35 |
+
|
| 36 |
+
## ποΈ Recommended Hyperparameter Changes
|
| 37 |
+
|
| 38 |
+
### For Better Code Generation:
|
| 39 |
+
|
| 40 |
+
| Parameter | Current | Recommended | Reason |
|
| 41 |
+
|-----------|---------|-------------|--------|
|
| 42 |
+
| **Temperature** | 0.3 | **0.1-0.2** | Lower = more deterministic, better for code |
|
| 43 |
+
| **Repetition Penalty** | 1.1 | **1.2-1.3** | Prevents repetitive text generation |
|
| 44 |
+
| **Max New Tokens** | 800 | **1000-1200** | Ensures complete code generation |
|
| 45 |
+
| **Top-p** | 0.9 | **0.95** | Slightly more diverse (if temperature > 0) |
|
| 46 |
+
|
| 47 |
+
### Optimal Settings for Code Generation:
|
| 48 |
+
|
| 49 |
+
```python
|
| 50 |
+
temperature = 0.1 # Very deterministic (best for exact code match)
|
| 51 |
+
repetition_penalty = 1.2 # Prevent repetition
|
| 52 |
+
max_new_tokens = 1000 # Ensure complete code
|
| 53 |
+
top_p = 0.95 # If using sampling
|
| 54 |
+
```
|
| 55 |
+
|
| 56 |
+
---
|
| 57 |
+
|
| 58 |
+
## π Test Command (Updated)
|
| 59 |
+
|
| 60 |
+
```bash
|
| 61 |
+
cd /workspace/ftt/codellama-migration
|
| 62 |
+
source /venv/main/bin/activate
|
| 63 |
+
python3 test_single_training_sample.py
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
This script tests with multiple temperatures (0.1, 0.2, 0.3) so you can see which works best.
|
| 67 |
+
|
| 68 |
+
---
|
| 69 |
+
|
| 70 |
+
## π Quick Test Command (Single Sample)
|
| 71 |
+
|
| 72 |
+
```bash
|
| 73 |
+
cd /workspace/ftt/codellama-migration
|
| 74 |
+
source /venv/main/bin/activate
|
| 75 |
+
|
| 76 |
+
# Extract first training sample and test
|
| 77 |
+
INSTRUCTION=$(sed -n '1p' datasets/processed/split/train.jsonl | python3 -c "import sys, json; print(json.load(sys.stdin)['instruction'])")
|
| 78 |
+
|
| 79 |
+
python3 scripts/inference/inference_codellama.py \
|
| 80 |
+
--mode local \
|
| 81 |
+
--model-path training-outputs/codellama-fifo-v1 \
|
| 82 |
+
--prompt "$INSTRUCTION" \
|
| 83 |
+
--max-new-tokens 1000 \
|
| 84 |
+
--temperature 0.1
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
---
|
| 88 |
+
|
| 89 |
+
## π Why Temperature 0.1 Instead of 0.3?
|
| 90 |
+
|
| 91 |
+
- **0.1**: Very deterministic, picks most likely token β Better code accuracy
|
| 92 |
+
- **0.3**: More variation, creative β May generate text instead of code
|
| 93 |
+
- **0.5+**: High variation β Not suitable for code generation
|
| 94 |
+
|
| 95 |
+
**For exact code matching with training data: Use 0.1**
|
| 96 |
+
|
| 97 |
+
---
|
| 98 |
+
|
| 99 |
+
## π Why Repetition Penalty 1.2?
|
| 100 |
+
|
| 101 |
+
- **1.0**: No penalty β Model repeats patterns
|
| 102 |
+
- **1.1**: Low penalty β Still gets repetitive
|
| 103 |
+
- **1.2-1.3**: Good balance β Prevents repetition without hurting quality
|
| 104 |
+
- **1.5+**: Too high β May suppress valid repetitions in code
|
| 105 |
+
|
| 106 |
+
---
|
| 107 |
+
|
| 108 |
+
## β
Summary of Changes
|
| 109 |
+
|
| 110 |
+
1. β
**Fixed prompt format** - Matches training format (instruction + EOS)
|
| 111 |
+
2. β
**Increased repetition_penalty** - 1.1 β 1.2
|
| 112 |
+
3. β
**Fixed response extraction** - Only decode newly generated tokens
|
| 113 |
+
4. β
**Lower temperature recommended** - 0.3 β 0.1 for exact matches
|
| 114 |
+
|
| 115 |
+
---
|
| 116 |
+
|
| 117 |
+
## π§ͺ Testing
|
| 118 |
+
|
| 119 |
+
Run the test script to see improvements:
|
| 120 |
+
|
| 121 |
+
```bash
|
| 122 |
+
python3 test_single_training_sample.py
|
| 123 |
+
```
|
| 124 |
+
|
| 125 |
+
This will test with temperatures 0.1, 0.2, and 0.3 so you can compare outputs.
|
| 126 |
+
|
| 127 |
+
---
|
| 128 |
+
|
| 129 |
+
**Next Steps:**
|
| 130 |
+
1. Test with updated inference script
|
| 131 |
+
2. Compare outputs with different temperatures
|
| 132 |
+
3. Choose optimal temperature for your use case
|
| 133 |
+
4. If issues persist, may need to retrain with better dataset format
|
| 134 |
+
|