i3-lab
/

i3-80m

@@ -114,7 +114,7 @@ Layers 11-16: Full Attention Blocks
 - **Batch Size**: 4 (with gradient accumulation support)
 - **Learning Rate**: 3e-4 (with warmup and cosine decay)
 - **Optimizer**: AdamW with gradient clipping (max norm: 1.0)
-- **Hardware**: NVIDIA GeForce RTX 3060 (12GB VRAM)
 - **Training Time**: ~2-4 hours
 - **Framework**: PyTorch

 - **Batch Size**: 4 (with gradient accumulation support)
 - **Learning Rate**: 3e-4 (with warmup and cosine decay)
 - **Optimizer**: AdamW with gradient clipping (max norm: 1.0)
+- **Hardware**: NVIDIA P100 (16GB VRAM)
 - **Training Time**: ~2-4 hours
 - **Framework**: PyTorch