Papers
updated
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
• 2402.17764
• Published
• 627
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
• 2403.03507
• Published
• 189
Griffin: Mixing Gated Linear Recurrences with Local Attention for
Efficient Language Models
Paper
• 2402.19427
• Published
• 56
ResLoRA: Identity Residual Mapping in Low-Rank Adaption
Paper
• 2402.18039
• Published
• 11
Beyond Language Models: Byte Models are Digital World Simulators
Paper
• 2402.19155
• Published
• 53
ShortGPT: Layers in Large Language Models are More Redundant Than You
Expect
Paper
• 2403.03853
• Published
• 66
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper
• 2402.19479
• Published
• 35
DoRA: Weight-Decomposed Low-Rank Adaptation
Paper
• 2402.09353
• Published
• 32
Training Neural Networks from Scratch with Parallel Low-Rank Adapters
Paper
• 2402.16828
• Published
• 4
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
• 2312.00752
• Published
• 150
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper
• 2403.09611
• Published
• 129
Simple and Scalable Strategies to Continually Pre-train Large Language
Models
Paper
• 2403.08763
• Published
• 51
InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding
Extremely Long Sequences with Training-Free Memory
Paper
• 2402.04617
• Published
• 6
Rho-1: Not All Tokens Are What You Need
Paper
• 2404.07965
• Published
• 94
Learn Your Reference Model for Real Good Alignment
Paper
• 2404.09656
• Published
• 90
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
• 2404.08801
• Published
• 66
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language
Models
Paper
• 2403.03432
• Published
• 1
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
Phone
Paper
• 2404.14219
• Published
• 259