Update README.md
Browse files
README.md
CHANGED
|
@@ -155,10 +155,10 @@ outputs = model.generate(
|
|
| 155 |
### LoRA Configuration
|
| 156 |
```python
|
| 157 |
{
|
| 158 |
-
"r": 8,
|
| 159 |
-
"lora_alpha": 16,
|
| 160 |
-
"lora_dropout": 0.05,
|
| 161 |
-
"target_modules": ["q_proj", "
|
| 162 |
"task_type": "CAUSAL_LM"
|
| 163 |
}
|
| 164 |
```
|
|
@@ -180,20 +180,18 @@ outputs = model.generate(
|
|
| 180 |
learning_rate = 2e-4
|
| 181 |
warmup_steps = 50
|
| 182 |
max_steps = 500
|
| 183 |
-
per_device_train_batch_size =
|
| 184 |
-
gradient_accumulation_steps =
|
| 185 |
effective_batch_size = 1024
|
| 186 |
|
| 187 |
# Optimization
|
| 188 |
optimizer = "adamw_torch_xla"
|
| 189 |
lr_scheduler = "cosine"
|
| 190 |
weight_decay = 0.01
|
| 191 |
-
max_grad_norm = 1.0
|
| 192 |
|
| 193 |
# Model Settings
|
| 194 |
sequence_length = 256
|
| 195 |
precision = "bfloat16"
|
| 196 |
-
gradient_checkpointing = True
|
| 197 |
```
|
| 198 |
|
| 199 |
### Training Infrastructure
|
|
@@ -239,15 +237,6 @@ The model was evaluated on the complete HumanEval benchmark (164 programming pro
|
|
| 239 |
|
| 240 |
This demonstrates that the educational fine-tuning maintains strong algorithmic correctness while improving code clarity and documentation.
|
| 241 |
|
| 242 |
-
### Sample Performance by Category
|
| 243 |
-
|
| 244 |
-
| Category | Base Model | Fine-tuned | Delta |
|
| 245 |
-
|----------|-----------|------------|-------|
|
| 246 |
-
| String Manipulation | 68% | 65% | -3% |
|
| 247 |
-
| Data Structures | 67% | 64% | -3% |
|
| 248 |
-
| Algorithms | 66% | 63% | -3% |
|
| 249 |
-
| Math/Logic | 64% | 65% | +1% |
|
| 250 |
-
|
| 251 |
---
|
| 252 |
|
| 253 |
## 🎓 Use Cases
|
|
|
|
| 155 |
### LoRA Configuration
|
| 156 |
```python
|
| 157 |
{
|
| 158 |
+
"r": 8,
|
| 159 |
+
"lora_alpha": 16,
|
| 160 |
+
"lora_dropout": 0.05,
|
| 161 |
+
"target_modules": ["q_proj", "v_proj"],
|
| 162 |
"task_type": "CAUSAL_LM"
|
| 163 |
}
|
| 164 |
```
|
|
|
|
| 180 |
learning_rate = 2e-4
|
| 181 |
warmup_steps = 50
|
| 182 |
max_steps = 500
|
| 183 |
+
per_device_train_batch_size = 16
|
| 184 |
+
gradient_accumulation_steps = 4
|
| 185 |
effective_batch_size = 1024
|
| 186 |
|
| 187 |
# Optimization
|
| 188 |
optimizer = "adamw_torch_xla"
|
| 189 |
lr_scheduler = "cosine"
|
| 190 |
weight_decay = 0.01
|
|
|
|
| 191 |
|
| 192 |
# Model Settings
|
| 193 |
sequence_length = 256
|
| 194 |
precision = "bfloat16"
|
|
|
|
| 195 |
```
|
| 196 |
|
| 197 |
### Training Infrastructure
|
|
|
|
| 237 |
|
| 238 |
This demonstrates that the educational fine-tuning maintains strong algorithmic correctness while improving code clarity and documentation.
|
| 239 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 240 |
---
|
| 241 |
|
| 242 |
## 🎓 Use Cases
|