Correcting the GPU used for i3-80m
Browse files
README.md
CHANGED
|
@@ -114,7 +114,7 @@ Layers 11-16: Full Attention Blocks
|
|
| 114 |
- **Batch Size**: 4 (with gradient accumulation support)
|
| 115 |
- **Learning Rate**: 3e-4 (with warmup and cosine decay)
|
| 116 |
- **Optimizer**: AdamW with gradient clipping (max norm: 1.0)
|
| 117 |
-
- **Hardware**: NVIDIA
|
| 118 |
- **Training Time**: ~2-4 hours
|
| 119 |
- **Framework**: PyTorch
|
| 120 |
|
|
|
|
| 114 |
- **Batch Size**: 4 (with gradient accumulation support)
|
| 115 |
- **Learning Rate**: 3e-4 (with warmup and cosine decay)
|
| 116 |
- **Optimizer**: AdamW with gradient clipping (max norm: 1.0)
|
| 117 |
+
- **Hardware**: NVIDIA P100 (16GB VRAM)
|
| 118 |
- **Training Time**: ~2-4 hours
|
| 119 |
- **Framework**: PyTorch
|
| 120 |
|