OpenMOSE commited on
Commit
fb074ec
·
verified ·
1 Parent(s): 4e7ffe6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -63,6 +63,8 @@ This is an **experimental research model** designed to explore hybrid architectu
63
  ## Training Details
64
 
65
  - **Training Context Window:** 4096 tokens
 
 
66
  - **Base Model Initialization:** Weights initialized from Reka-flash3 21B
67
  - **Architecture Conversion:** Transformer attention blocks systematically replaced with RWKV blocks, except for 6 strategically placed GQA layers
68
 
 
63
  ## Training Details
64
 
65
  - **Training Context Window:** 4096 tokens
66
+ - **Training GPU** AMD MI300X x 1(takes 68hrs)
67
+ - **Training Strategy** 8bit MLP Quant, frozen emb,mlp,head, Deepspeed Stage1
68
  - **Base Model Initialization:** Weights initialized from Reka-flash3 21B
69
  - **Architecture Conversion:** Transformer attention blocks systematically replaced with RWKV blocks, except for 6 strategically placed GQA layers
70