Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Paper
• 2410.20672 • Published
• 6
YAML Metadata Warning: empty or missing yaml metadata in repo card
Check out the documentation for more information.
Relaxed Recursive Transformer implementation, uptraining with distillation on openwebtext2. arxiv.org/abs/2410.20672