Beebey commited on
Commit
efac3a4
·
verified ·
1 Parent(s): 1810d55

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -17
README.md CHANGED
@@ -155,10 +155,10 @@ outputs = model.generate(
155
  ### LoRA Configuration
156
  ```python
157
  {
158
- "r": 8, # LoRA rank
159
- "lora_alpha": 16, # LoRA scaling factor
160
- "lora_dropout": 0.05, # Dropout probability
161
- "target_modules": ["q_proj", "k_proj", "v_proj", "o_proj"],
162
  "task_type": "CAUSAL_LM"
163
  }
164
  ```
@@ -180,20 +180,18 @@ outputs = model.generate(
180
  learning_rate = 2e-4
181
  warmup_steps = 50
182
  max_steps = 500
183
- per_device_train_batch_size = 8
184
- gradient_accumulation_steps = 128
185
  effective_batch_size = 1024
186
 
187
  # Optimization
188
  optimizer = "adamw_torch_xla"
189
  lr_scheduler = "cosine"
190
  weight_decay = 0.01
191
- max_grad_norm = 1.0
192
 
193
  # Model Settings
194
  sequence_length = 256
195
  precision = "bfloat16"
196
- gradient_checkpointing = True
197
  ```
198
 
199
  ### Training Infrastructure
@@ -239,15 +237,6 @@ The model was evaluated on the complete HumanEval benchmark (164 programming pro
239
 
240
  This demonstrates that the educational fine-tuning maintains strong algorithmic correctness while improving code clarity and documentation.
241
 
242
- ### Sample Performance by Category
243
-
244
- | Category | Base Model | Fine-tuned | Delta |
245
- |----------|-----------|------------|-------|
246
- | String Manipulation | 68% | 65% | -3% |
247
- | Data Structures | 67% | 64% | -3% |
248
- | Algorithms | 66% | 63% | -3% |
249
- | Math/Logic | 64% | 65% | +1% |
250
-
251
  ---
252
 
253
  ## 🎓 Use Cases
 
155
  ### LoRA Configuration
156
  ```python
157
  {
158
+ "r": 8,
159
+ "lora_alpha": 16,
160
+ "lora_dropout": 0.05,
161
+ "target_modules": ["q_proj", "v_proj"],
162
  "task_type": "CAUSAL_LM"
163
  }
164
  ```
 
180
  learning_rate = 2e-4
181
  warmup_steps = 50
182
  max_steps = 500
183
+ per_device_train_batch_size = 16
184
+ gradient_accumulation_steps = 4
185
  effective_batch_size = 1024
186
 
187
  # Optimization
188
  optimizer = "adamw_torch_xla"
189
  lr_scheduler = "cosine"
190
  weight_decay = 0.01
 
191
 
192
  # Model Settings
193
  sequence_length = 256
194
  precision = "bfloat16"
 
195
  ```
196
 
197
  ### Training Infrastructure
 
237
 
238
  This demonstrates that the educational fine-tuning maintains strong algorithmic correctness while improving code clarity and documentation.
239
 
 
 
 
 
 
 
 
 
 
240
  ---
241
 
242
  ## 🎓 Use Cases