Text Generation
Transformers
PyTorch
Safetensors
English
i3
i3-architecture
hybrid-model
rwkv-mamba
custom_code
FlameF0X commited on
Commit
997ce06
·
verified ·
1 Parent(s): 1000de1

Correcting the GPU used for i3-80m

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -114,7 +114,7 @@ Layers 11-16: Full Attention Blocks
114
  - **Batch Size**: 4 (with gradient accumulation support)
115
  - **Learning Rate**: 3e-4 (with warmup and cosine decay)
116
  - **Optimizer**: AdamW with gradient clipping (max norm: 1.0)
117
- - **Hardware**: NVIDIA GeForce RTX 3060 (12GB VRAM)
118
  - **Training Time**: ~2-4 hours
119
  - **Framework**: PyTorch
120
 
 
114
  - **Batch Size**: 4 (with gradient accumulation support)
115
  - **Learning Rate**: 3e-4 (with warmup and cosine decay)
116
  - **Optimizer**: AdamW with gradient clipping (max norm: 1.0)
117
+ - **Hardware**: NVIDIA P100 (16GB VRAM)
118
  - **Training Time**: ~2-4 hours
119
  - **Framework**: PyTorch
120