Prithvik-1 commited on
Commit
6045a16
Β·
verified Β·
1 Parent(s): c2cfe99

Upload INFERENCE_GUIDE.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. INFERENCE_GUIDE.md +276 -0
INFERENCE_GUIDE.md ADDED
@@ -0,0 +1,276 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸš€ CodeLlama Inference Guide
2
+
3
+ **Last Updated:** November 25, 2025
4
+
5
+ ---
6
+
7
+ ## πŸ“‹ Overview
8
+
9
+ This guide explains how to use the updated CodeLlama inference script with your fine-tuned model.
10
+
11
+ ---
12
+
13
+ ## 🎯 Quick Start
14
+
15
+ ### Basic Inference (Single Prompt)
16
+
17
+ ```bash
18
+ cd /workspace/ftt/codellama-migration
19
+
20
+ python3 scripts/inference/inference_codellama.py \
21
+ --mode local \
22
+ --model-path training-outputs/codellama-fifo-v1 \
23
+ --prompt "Your prompt here"
24
+ ```
25
+
26
+ Or use the test script:
27
+
28
+ ```bash
29
+ bash test_inference.sh
30
+ ```
31
+
32
+ ### Interactive Mode
33
+
34
+ ```bash
35
+ python3 scripts/inference/inference_codellama.py \
36
+ --mode local \
37
+ --model-path training-outputs/codellama-fifo-v1
38
+ ```
39
+
40
+ Type your prompts interactively, type `quit` or `exit` to stop.
41
+
42
+ ---
43
+
44
+ ## βš™οΈ Command-Line Arguments
45
+
46
+ ### Required Arguments (for local mode)
47
+
48
+ | Argument | Description | Default |
49
+ |----------|-------------|---------|
50
+ | `--mode` | Inference mode: `local` or `ollama` | `local` |
51
+ | `--model-path` | Path to fine-tuned model | `training-outputs/codellama-fifo-v1` |
52
+
53
+ ### Optional Arguments
54
+
55
+ | Argument | Description | Default |
56
+ |----------|-------------|---------|
57
+ | `--base-model-path` | Path to base CodeLlama model | Auto-detected from training config |
58
+ | `--prompt` | Single prompt to process | (Interactive mode if not provided) |
59
+ | `--max-new-tokens` | Maximum tokens to generate | `800` |
60
+ | `--temperature` | Generation temperature (lower = deterministic) | `0.3` |
61
+ | `--merge-weights` | Merge LoRA weights (slower load, faster inference) | `False` |
62
+ | `--no-quantization` | Disable 4-bit quantization | Auto (quantized on GPU) |
63
+
64
+ ---
65
+
66
+ ## πŸ“ Examples
67
+
68
+ ### Example 1: Basic Inference
69
+
70
+ ```bash
71
+ python3 scripts/inference/inference_codellama.py \
72
+ --prompt "Generate a synchronous FIFO with 8-bit data width, depth 4"
73
+ ```
74
+
75
+ ### Example 2: Custom Parameters
76
+
77
+ ```bash
78
+ python3 scripts/inference/inference_codellama.py \
79
+ --model-path training-outputs/codellama-fifo-v1 \
80
+ --prompt "Your prompt" \
81
+ --max-new-tokens 1200 \
82
+ --temperature 0.5
83
+ ```
84
+
85
+ ### Example 3: Merged Weights (Faster Inference)
86
+
87
+ ```bash
88
+ python3 scripts/inference/inference_codellama.py \
89
+ --model-path training-outputs/codellama-fifo-v1 \
90
+ --merge-weights \
91
+ --prompt "Your prompt"
92
+ ```
93
+
94
+ **Note:** `--merge-weights` merges LoRA adapters into the base model. This takes longer to load but runs inference faster.
95
+
96
+ ### Example 4: Custom Base Model Path
97
+
98
+ ```bash
99
+ python3 scripts/inference/inference_codellama.py \
100
+ --model-path training-outputs/codellama-fifo-v1 \
101
+ --base-model-path /path/to/custom/base/model \
102
+ --prompt "Your prompt"
103
+ ```
104
+
105
+ ---
106
+
107
+ ## πŸŽ›οΈ Generation Parameters
108
+
109
+ ### Temperature
110
+
111
+ - **0.1-0.3**: Very deterministic, focused outputs (recommended for code generation)
112
+ - **0.5-0.7**: Balanced creativity and determinism
113
+ - **0.8-1.0**: More creative, varied outputs
114
+
115
+ **Default:** `0.3` (optimized for code generation)
116
+
117
+ ### Max New Tokens
118
+
119
+ - **512**: Short responses
120
+ - **800**: Default (balanced)
121
+ - **1200+**: Longer code blocks
122
+
123
+ **Default:** `800` tokens
124
+
125
+ ---
126
+
127
+ ## πŸ”§ Model Loading
128
+
129
+ ### Automatic Base Model Detection
130
+
131
+ The script automatically detects the base model in this order:
132
+
133
+ 1. `--base-model-path` argument (if provided)
134
+ 2. Local default path: `models/base-models/CodeLlama-7B-Instruct`
135
+ 3. Training config: Reads `training_config.json` from model directory
136
+ 4. HuggingFace: Falls back to `codellama/CodeLlama-7b-Instruct-hf`
137
+
138
+ ### LoRA Adapter vs Merged Model
139
+
140
+ - **LoRA Adapter (default)**: Faster loading, uses adapter weights
141
+ - **Merged Model (`--merge-weights`)**: Slower loading, but faster inference
142
+
143
+ ---
144
+
145
+ ## πŸ“Š Output Format
146
+
147
+ The inference script automatically:
148
+ - Extracts Verilog code from markdown code blocks (````verilog`)
149
+ - Removes conversation wrappers
150
+ - Returns clean RTL code
151
+
152
+ ### Example Output
153
+
154
+ **Input:**
155
+ ```
156
+ Generate a synchronous FIFO with 8-bit data width, depth 4
157
+ ```
158
+
159
+ **Output:**
160
+ ```verilog
161
+ module sync_fifo_8b_4d (
162
+ input clk,
163
+ input rst,
164
+ input write_en,
165
+ input read_en,
166
+ input [7:0] write_data,
167
+ output [7:0] read_data
168
+ );
169
+ // ... code ...
170
+ endmodule
171
+ ```
172
+
173
+ ---
174
+
175
+ ## πŸš€ Performance Tips
176
+
177
+ ### 1. Use Merged Weights for Repeated Inference
178
+
179
+ If running many inferences, merge weights once:
180
+
181
+ ```bash
182
+ # First run (slower loading)
183
+ python3 scripts/inference/inference_codellama.py \
184
+ --merge-weights \
185
+ --model-path training-outputs/codellama-fifo-v1
186
+
187
+ # Subsequent runs use cached merged model (if saved)
188
+ ```
189
+
190
+ ### 2. Adjust Max Tokens Based on Task
191
+
192
+ ```bash
193
+ # Short responses
194
+ --max-new-tokens 400
195
+
196
+ # Long code blocks
197
+ --max-new-tokens 1200
198
+ ```
199
+
200
+ ### 3. Lower Temperature for Code Generation
201
+
202
+ ```bash
203
+ # Very deterministic (recommended)
204
+ --temperature 0.2
205
+
206
+ # Slightly more varied
207
+ --temperature 0.5
208
+ ```
209
+
210
+ ---
211
+
212
+ ## πŸ“ File Structure
213
+
214
+ ```
215
+ codellama-migration/
216
+ β”œβ”€β”€ scripts/
217
+ β”‚ └── inference/
218
+ β”‚ └── inference_codellama.py # Updated inference script
219
+ β”œβ”€β”€ training-outputs/
220
+ β”‚ └── codellama-fifo-v1/ # Fine-tuned model
221
+ β”‚ β”œβ”€β”€ adapter_model.safetensors # LoRA weights
222
+ β”‚ β”œβ”€β”€ adapter_config.json
223
+ β”‚ └── training_config.json
224
+ β”œβ”€β”€ models/
225
+ β”‚ └── base-models/
226
+ β”‚ └── CodeLlama-7B-Instruct/ # Base model
227
+ └── test_inference.sh # Test script
228
+ ```
229
+
230
+ ---
231
+
232
+ ## πŸ” Troubleshooting
233
+
234
+ ### Model Not Found
235
+
236
+ ```bash
237
+ Error: Model path training-outputs/codellama-fifo-v1 does not exist
238
+ ```
239
+
240
+ **Solution:** Check that the model path is correct:
241
+ ```bash
242
+ ls -lh training-outputs/codellama-fifo-v1/
243
+ ```
244
+
245
+ ### Base Model Not Found
246
+
247
+ If base model detection fails, specify explicitly:
248
+
249
+ ```bash
250
+ --base-model-path /workspace/ftt/codellama-migration/models/base-models/CodeLlama-7B-Instruct
251
+ ```
252
+
253
+ ### Out of Memory
254
+
255
+ 1. Ensure quantization is enabled (default on GPU)
256
+ 2. Reduce `--max-new-tokens`
257
+ 3. Use `--no-quantization` only if you have enough memory
258
+
259
+ ### Slow Inference
260
+
261
+ 1. Use `--merge-weights` for faster inference
262
+ 2. Reduce `--max-new-tokens`
263
+ 3. Lower `--temperature` (less sampling overhead)
264
+
265
+ ---
266
+
267
+ ## πŸ“š Related Documents
268
+
269
+ - `TRAINING_GUIDE.md` - Fine-tuning guide
270
+ - `HYPERPARAMETER_ANALYSIS.md` - Hyperparameter details
271
+ - `MIGRATION_PROGRESS.md` - Migration status
272
+
273
+ ---
274
+
275
+ **Happy Inferencing! πŸŽ‰**
276
+