ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning Paper • 2603.10160 • Published 14 days ago • 25
Exploring the Performance Improvement of Tensor Processing Engines through Transformation in the Bit-weight Dimension of MACs Paper • 2503.06342 • Published Mar 8, 2025 • 1
Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity Paper • 2501.16295 • Published Jan 27, 2025 • 8
EN-T: Optimizing Tensor Computing Engines Performance via Encoder-Based Methodology Paper • 2404.11887 • Published Apr 18, 2024
LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration Paper • 2408.06003 • Published Aug 12, 2024
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs Paper • 2410.13276 • Published Oct 17, 2024 • 29
Learning to (Learn at Test Time): RNNs with Expressive Hidden States Paper • 2407.04620 • Published Jul 5, 2024 • 34
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression Paper • 2406.14909 • Published Jun 21, 2024 • 16