VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse Paper • 2512.14531 • Published 9 days ago • 11
ROOT: Robust Orthogonalized Optimizer for Neural Network Training Paper • 2511.20626 • Published about 1 month ago • 42
Retentive Network: A Successor to Transformer for Large Language Models Paper • 2307.08621 • Published Jul 17, 2023 • 172