kaizuberbuehler 's Collections LM Training
updated
Rho-1: Not All Tokens Are What You Need
Paper
• 2404.07965
• Published • 94
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Paper
• 2404.10667
• Published • 24
Instruction-tuned Language Models are Better Knowledge Learners
Paper
• 2402.12847
• Published • 26
DoRA: Weight-Decomposed Low-Rank Adaptation
Paper
• 2402.09353
• Published • 32
QLoRA: Efficient Finetuning of Quantized LLMs
Paper
• 2305.14314
• Published • 60
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
• 2403.03507
• Published • 189
Reverse Training to Nurse the Reversal Curse
Paper
• 2403.13799
• Published • 13
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
• 2312.15166
• Published • 61
ReFT: Representation Finetuning for Language Models
Paper
• 2404.03592
• Published • 101
Direct Nash Optimization: Teaching Language Models to Self-Improve with
General Preferences
Paper
• 2404.03715
• Published • 62
Learn Your Reference Model for Real Good Alignment
Paper
• 2404.09656
• Published • 90
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
• 2404.08801
• Published • 66
Pre-training Small Base LMs with Fewer Tokens
Paper
• 2404.08634
• Published • 36
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Paper
• 2404.07413
• Published • 38
MiniCPM: Unveiling the Potential of Small Language Models with Scalable
Training Strategies
Paper
• 2404.06395
• Published • 24
SambaLingo: Teaching Large Language Models New Languages
Paper
• 2404.05829
• Published • 13
Advancing LLM Reasoning Generalists with Preference Trees
Paper
• 2404.02078
• Published • 46
Poro 34B and the Blessing of Multilinguality
Paper
• 2404.01856
• Published • 15
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
Phone
Paper
• 2404.14219
• Published • 259
The Instruction Hierarchy: Training LLMs to Prioritize Privileged
Instructions
Paper
• 2404.13208
• Published • 40
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper
• 2404.05892
• Published • 40
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
• 2312.00752
• Published • 150
OpenELM: An Efficient Language Model Family with Open-source Training
and Inference Framework
Paper
• 2404.14619
• Published • 126
Jamba: A Hybrid Transformer-Mamba Language Model
Paper
• 2403.19887
• Published • 112
Make Your LLM Fully Utilize the Context
Paper
• 2404.16811
• Published • 55
Tele-FLM Technical Report
Paper
• 2404.16645
• Published • 18
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video
Dense Captioning
Paper
• 2404.16994
• Published • 37
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper
• 2405.00732
• Published • 122
Iterative Reasoning Preference Optimization
Paper
• 2404.19733
• Published • 50
What matters when building vision-language models?
Paper
• 2405.02246
• Published • 104
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper
• 2405.12130
• Published • 50
Your Transformer is Secretly Linear
Paper
• 2405.12250
• Published • 157
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small
Reference Models
Paper
• 2405.20541
• Published • 24
How Do Large Language Models Acquire Factual Knowledge During
Pretraining?
Paper
• 2406.11813
• Published • 31
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs
with Nothing
Paper
• 2406.08464
• Published • 72
The Llama 3 Herd of Models
Paper
• 2407.21783
• Published • 118
Gemma 2: Improving Open Language Models at a Practical Size
Paper
• 2408.00118
• Published • 78
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware
Experts
Paper
• 2407.21770
• Published • 22
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Paper
• 2408.07055
• Published • 69
Data curation via joint example selection further accelerates multimodal
learning
Paper
• 2406.17711
• Published • 3
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Paper
• 2408.12570
• Published • 32
OLMoE: Open Mixture-of-Experts Language Models
Paper
• 2409.02060
• Published • 80
Training Language Models to Self-Correct via Reinforcement Learning
Paper
• 2409.12917
• Published • 140
GRIN: GRadient-INformed MoE
Paper
• 2409.12136
• Published • 16
Preference Tuning with Human Feedback on Language, Speech, and Vision
Tasks: A Survey
Paper
• 2409.11564
• Published • 20
NVLM: Open Frontier-Class Multimodal LLMs
Paper
• 2409.11402
• Published • 74
Instruction Following without Instruction Tuning
Paper
• 2409.14254
• Published • 29
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
Multimodal Models
Paper
• 2409.17146
• Published • 121
Programming Every Example: Lifting Pre-training Data Quality like
Experts at Scale
Paper
• 2409.17115
• Published • 64
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from
Disparate Training Data
Paper
• 2406.14546
• Published • 3
Thinking LLMs: General Instruction Following with Thought Generation
Paper
• 2410.10630
• Published • 20
Paper
• 2412.08905
• Published • 122
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper
• 2412.16145
• Published • 38
RobustFT: Robust Supervised Fine-tuning for Large Language Models under
Noisy Response
Paper
• 2412.14922
• Published • 88
Diving into Self-Evolving Training for Multimodal Reasoning
Paper
• 2412.17451
• Published • 42
Paper
• 2412.16720
• Published • 37
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training
Paper
• 2411.15124
• Published • 67
Natural Language Reinforcement Learning
Paper
• 2411.14251
• Published • 31
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Paper
• 2411.04905
• Published • 127
Mixture-of-Transformers: A Sparse and Scalable Architecture for
Multi-Modal Foundation Models
Paper
• 2411.04996
• Published • 50
2.5 Years in Class: A Multimodal Textbook for Vision-Language
Pretraining
Paper
• 2501.00958
• Published • 110
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta
Chain-of-Though
Paper
• 2501.04682
• Published • 99
Scaling Laws for Floating Point Quantization Training
Paper
• 2501.02423
• Published • 26
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
Paper
• 2501.01904
• Published • 33
O1 Replication Journey -- Part 3: Inference-time Scaling for Medical
Reasoning
Paper
• 2501.06458
• Published • 31
Enhancing Human-Like Responses in Large Language Models
Paper
• 2501.05032
• Published • 61
Do generative video models learn physical principles from watching
videos?
Paper
• 2501.09038
• Published • 34
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
• 2501.12948
• Published • 444
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper
• 2501.12599
• Published • 128
Demystifying Long Chain-of-Thought Reasoning in LLMs
Paper
• 2502.03373
• Published • 58
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
• 2501.17161
• Published • 125
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative
Textual Feedback
Paper
• 2501.12895
• Published • 61
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary
Feedback
Paper
• 2501.10799
• Published • 15
Qwen2.5-1M Technical Report
Paper
• 2501.15383
• Published • 72
Baichuan-Omni-1.5 Technical Report
Paper
• 2501.15368
• Published • 60
Optimizing Large Language Model Training Using FP4 Quantization
Paper
• 2501.17116
• Published • 36
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
Paper
• 2501.16975
• Published • 32
Critique Fine-Tuning: Learning to Critique is More Effective than
Learning to Imitate
Paper
• 2501.17703
• Published • 59
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper
• 2502.02737
• Published • 257
LIMO: Less is More for Reasoning
Paper
• 2502.03387
• Published • 62
QuEST: Stable Training of LLMs with 1-Bit Weights and Activations
Paper
• 2502.05003
• Published • 44
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference
Paper
• 2502.04416
• Published • 12
Scaling Pre-training to One Hundred Billion Data for Vision Language
Models
Paper
• 2502.07617
• Published • 29
Gemstones: A Model Suite for Multi-Faceted Scaling Laws
Paper
• 2502.06857
• Published • 24
Typhoon T1: An Open Thai Reasoning Model
Paper
• 2502.09042
• Published • 16
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse
Attention
Paper
• 2502.11089
• Published • 169
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
Paper
• 2502.14502
• Published • 92
Continuous Diffusion Model for Language Modeling
Paper
• 2502.11564
• Published • 53
Train Small, Infer Large: Memory-Efficient LoRA Training for Large
Language Models
Paper
• 2502.13533
• Published • 13
LongRoPE2: Near-Lossless LLM Context Window Scaling
Paper
• 2502.20082
• Published • 36
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Paper
• 2502.17055
• Published • 20
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper
• 2503.01785
• Published • 86
Predictive Data Selection: The Data That Predicts Is the Data That
Teaches
Paper
• 2503.00808
• Published • 57
Gemini Robotics: Bringing AI into the Physical World
Paper
• 2503.20020
• Published • 31
Large-Scale Data Selection for Instruction Tuning
Paper
• 2503.01807
• Published • 14
Unified Reward Model for Multimodal Understanding and Generation
Paper
• 2503.05236
• Published • 124
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and
Beyond
Paper
• 2503.10460
• Published • 30
TTRL: Test-Time Reinforcement Learning
Paper
• 2504.16084
• Published • 122
Learning from Failures in Multi-Attempt Reinforcement Learning
Paper
• 2503.04808
• Published • 18
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation
Paper
• 2503.04872
• Published • 15
Self-Taught Self-Correction for Small Language Models
Paper
• 2503.08681
• Published • 15
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper
• 2503.15558
• Published • 50
SkyLadder: Better and Faster Pretraining via Context Window Scheduling
Paper
• 2503.15450
• Published • 12
Paper
• 2503.19786
• Published • 55
Modifying Large Language Model Post-Training for Diverse Creative
Writing
Paper
• 2503.17126
• Published • 36
FastCuRL: Curriculum Reinforcement Learning with Progressive Context
Extension for Efficient Training R1-like Reasoning Models
Paper
• 2503.17287
• Published • 11
ZClip: Adaptive Spike Mitigation for LLM Pre-Training
Paper
• 2504.02507
• Published • 88
RL Tango: Reinforcing Generator and Verifier Together for Language
Reasoning
Paper
• 2505.15034
• Published • 5
Improved Visual-Spatial Reasoning via R1-Zero-Like Training
Paper
• 2504.00883
• Published • 67
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement
Learning on the Base Model
Paper
• 2503.24290
• Published • 62
JudgeLRM: Large Reasoning Models as a Judge
Paper
• 2504.00050
• Published • 62
Inference-Time Scaling for Generalist Reward Modeling
Paper
• 2504.02495
• Published • 58
Understanding R1-Zero-Like Training: A Critical Perspective
Paper
• 2503.20783
• Published • 59
Exploring Data Scaling Trends and Effects in Reinforcement Learning from
Human Feedback
Paper
• 2503.22230
• Published • 45
Unicorn: Text-Only Data Synthesis for Vision Language Model Training
Paper
• 2503.22655
• Published • 38
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal
LLMs on Academic Resources
Paper
• 2504.00595
• Published • 37
Rethinking RL Scaling for Vision Language Models: A Transparent,
From-Scratch Framework and Comprehensive Evaluation Scheme
Paper
• 2504.02587
• Published • 32
Scaling Analysis of Interleaved Speech-Text Language Models
Paper
• 2504.02398
• Published • 31
Scaling Language-Free Visual Representation Learning
Paper
• 2504.01017
• Published • 33
RIG: Synergizing Reasoning and Imagination in End-to-End Generalist
Policy
Paper
• 2503.24388
• Published • 29
Z1: Efficient Test-time Scaling with Code
Paper
• 2504.00810
• Published • 27
Command A: An Enterprise-Ready Large Language Model
Paper
• 2504.00698
• Published • 29
Expanding RL with Verifiable Rewards Across Diverse Domains
Paper
• 2503.23829
• Published • 24
ActionStudio: A Lightweight Framework for Data and Training of Large
Action Models
Paper
• 2503.22673
• Published • 12
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards
for Reasoning-Enhanced Text-to-SQL
Paper
• 2503.23157
• Published • 10
Paper
• 2504.07491
• Published • 138
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought
Paper
• 2504.05599
• Published • 86
Rethinking Reflection in Pre-Training
Paper
• 2504.04022
• Published • 80
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training
Tokens
Paper
• 2504.07096
• Published • 77
Scaling Laws for Native Multimodal Models Scaling Laws for Native
Multimodal Models
Paper
• 2504.07951
• Published • 30
VAPO: Efficient and Reliable Reinforcement Learning for Advanced
Reasoning Tasks
Paper
• 2504.05118
• Published • 26
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths
to Reproducibility
Paper
• 2504.07086
• Published • 21
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual
Reasoning Self-Improvement
Paper
• 2504.07934
• Published • 21
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement
Fine-Tuning
Paper
• 2504.06958
• Published • 13
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Paper
• 2504.05520
• Published • 11
InternVL3: Exploring Advanced Training and Test-Time Recipes for
Open-Source Multimodal Models
Paper
• 2504.10479
• Published • 308
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for
Language Model Pre-training
Paper
• 2504.13161
• Published • 97
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper
• 2504.10481
• Published • 85
BitNet b1.58 2B4T Technical Report
Paper
• 2504.12285
• Published • 83
Genius: A Generalizable and Purely Unsupervised Self-Training Framework
For Advanced Reasoning
Paper
• 2504.08672
• Published • 55
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models
with Reinforcement Learning
Paper
• 2504.08837
• Published • 44
Heimdall: test-time scaling on the generative verification
Paper
• 2504.10337
• Published • 33
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large
Vision-Language Models
Paper
• 2504.11468
• Published • 30
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
Paper
• 2504.13055
• Published • 19
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to
Reinforce
Paper
• 2504.11343
• Published • 20
DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM
Post-training
Paper
• 2504.09710
• Published • 19
DataDecide: How to Predict Best Pretraining Data with Small Experiments
Paper
• 2504.11393
• Published • 18
Breaking the Data Barrier -- Building GUI Agents Through Task
Generalization
Paper
• 2504.10127
• Published • 17
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper
• 2504.10449
• Published • 15
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation
through Pretraining, SFT, and RL
Paper
• 2504.11455
• Published • 14
Efficient Process Reward Model Training via Active Learning
Paper
• 2504.10559
• Published • 13
Exploring Expert Failures Improves LLM Agent Tuning
Paper
• 2504.13145
• Published • 12
Eagle 2.5: Boosting Long-Context Post-Training for Frontier
Vision-Language Models
Paper
• 2504.15271
• Published • 68
ToolRL: Reward is All Tool Learning Needs
Paper
• 2504.13958
• Published • 49
OTC: Optimal Tool Calls via Reinforcement Learning
Paper
• 2504.14870
• Published • 35
QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM
Pretraining
Paper
• 2504.16511
• Published • 22
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making
Abilities
Paper
• 2504.16078
• Published • 21
Efficient Pretraining Length Scaling
Paper
• 2504.14992
• Published • 20