AbstractPhil commited on
Commit
266e487
·
verified ·
1 Parent(s): 28ae001

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -10,7 +10,7 @@ The newest vit_zana_nano train has shown a very clean curve. runs/vit_zana_nano/
10
 
11
  This clean curve means the process is stable enough to introduce a larger set of depth and blocks without destroying the internals; simultaneously enforcing the 5 loss formulas specifically curated for the pentachora math.
12
 
13
- I've begun training a much deeper zana dubbed vit_zana_shaper. This model has 32 layers deep with MLP ratio of 1 and 2 attention heads, resting at about 3.5 million params or so.
14
 
15
  Lets see how she fares.
16
 
 
10
 
11
  This clean curve means the process is stable enough to introduce a larger set of depth and blocks without destroying the internals; simultaneously enforcing the 5 loss formulas specifically curated for the pentachora math.
12
 
13
+ I've begun training a much deeper zana dubbed vit_zana_shaper. This model has 32 blocks deep with MLP ratio of 1 and 2 attention heads, resting at about 3.5 million params or so.
14
 
15
  Lets see how she fares.
16