File size: 118 Bytes
1722c84
 
1
2
Relaxed Recursive Transformer implementation, uptraining with distillation on openwebtext2. 
arxiv.org/abs/2410.20672