New architecture: TMT — dynamic graph attention + adaptive depth routing, 29.4 PPL at 48% compute (120M params)
#8 opened 19 days ago
by
vigneshwar234
Add MOT badge
#7 opened 8 months ago
by
lehors
Adding `safetensors` variant of this model
#6 opened 9 months ago
by
wendycheng
what was the training set
#5 opened over 1 year ago
by
bedio
Prompt Template
1
#4 opened almost 2 years ago
by
ha1772007