Translation Errors Significantly Impact Low-Resource Languages in Cross-Lingual Learning Paper • 2402.02080 • Published Feb 3, 2024 • 2
TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments Paper • 2510.01179 • Published Oct 1, 2025 • 27
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts Paper • 2406.12034 • Published Jun 17, 2024 • 16
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention Paper • 2405.12981 • Published May 21, 2024 • 33
Reprogramming under constraints: Revisiting efficient and reliable transferability of lottery tickets Paper • 2308.14969 • Published Aug 29, 2023
From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers Paper • 2402.01911 • Published Feb 2, 2024 • 2
Data Engineering for Scaling Language Models to 128K Context Paper • 2402.10171 • Published Feb 15, 2024 • 25