Search, Verify and Feedback: Towards Next Generation Post-training
Paradigm of Foundation Models via Verifier Engineering
Paper
• 2411.11504
• Published
• 24
Top-nσ: Not All Logits Are You Need
Paper
• 2411.07641
• Published
• 24
Adaptive Decoding via Latent Preference Optimization
Paper
• 2411.09661
• Published
• 10
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context
Training
Paper
• 2411.13476
• Published
• 16
Viewer
• Updated
• 2.2M • 6.59k
• 392
Hymba: A Hybrid-head Architecture for Small Language Models
Paper
• 2411.13676
• Published
• 47
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training
Paper
• 2411.15124
• Published
• 67
Star Attention: Efficient LLM Inference over Long Sequences
Paper
• 2411.17116
• Published
• 53
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple
Distillation, Big Progress or Bitter Lesson?
Paper
• 2411.16489
• Published
• 45
MH-MoE:Multi-Head Mixture-of-Experts
Paper
• 2411.16205
• Published
• 26
nGPT: Normalized Transformer with Representation Learning on the
Hypersphere
Paper
• 2410.01131
• Published
• 10
Viewer
• Updated
• 77.7k • 731
• 387
Viewer
• Updated
• 860k • 13.3k
• 543
Viewer
• Updated
• 327 • 108
• 134
allenai/tulu-3-sft-mixture
Viewer
• Updated
• 939k • 15.5k
• 229
CASIA-LM/ChineseWebText2.0
Viewer
• Updated
• 2k • 3.46k
• 28
Yi-Lightning Technical Report
Paper
• 2412.01253
• Published
• 28
Training Large Language Models to Reason in a Continuous Latent Space
Paper
• 2412.06769
• Published
• 94
Weighted-Reward Preference Optimization for Implicit Model Fusion
Paper
• 2412.03187
• Published
• 12
Paper
• 2412.08905
• Published
• 122
SPaR: Self-Play with Tree-Search Refinement to Improve
Instruction-Following in Large Language Models
Paper
• 2412.11605
• Published
• 18
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and
Post-LN
Paper
• 2412.13795
• Published
• 20
Paper
• 2412.15115
• Published
• 377
A Post-Training Enhanced Optimization Approach for Small Language Models
Paper
• 2411.02939
• Published
Viewer
• Updated
• 133k • 370
• 150
How to Synthesize Text Data without Model Collapse?
Paper
• 2412.14689
• Published
• 53
Viewer
• Updated
• 18.7M • 92
• 56
RobustFT: Robust Supervised Fine-tuning for Large Language Models under
Noisy Response
Paper
• 2412.14922
• Published
• 88
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought
Paper
• 2412.17498
• Published
• 22
B-STaR: Monitoring and Balancing Exploration and Exploitation in
Self-Taught Reasoners
Paper
• 2412.17256
• Published
• 47
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks
with Reinforcement Fine-Tuning
Paper
• 2412.16849
• Published
• 9
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep
Thinking
Paper
• 2501.04519
• Published
• 288
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper
• 2501.08313
• Published
• 300
OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for
LLM Training
Paper
• 2501.08197
• Published
• 9
Viewer
• Updated
• 1k • 1.12k
• 239
Infi-MM/InfiMM-WebMath-40B
Viewer
• Updated
• 22.8M • 549
• 68
Exploring the Limit of Outcome Reward for Learning Mathematical
Reasoning
Paper
• 2502.06781
• Published
• 58
Technologies on Effectiveness and Efficiency: A Survey of State Spaces
Models
Paper
• 2503.11224
• Published
• 28
The Lessons of Developing Process Reward Models in Mathematical
Reasoning
Paper
• 2501.07301
• Published
• 100
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU
Inference via Dynamic-Length Float
Paper
• 2504.11651
• Published
• 31
Paper2Code: Automating Code Generation from Scientific Papers in Machine
Learning
Paper
• 2504.17192
• Published
• 123
Viewer
• Updated
• 692k • 260
• 27
AdaptThink: Reasoning Models Can Learn When to Think
Paper
• 2505.13417
• Published
• 83
Multi-Token Prediction Needs Registers
Paper
• 2505.10518
• Published
• 14
Quartet: Native FP4 Training Can Be Optimal for Large Language Models
Paper
• 2505.14669
• Published
• 78
Distilling LLM Agent into Small Models with Retrieval and Code Tools
Paper
• 2505.17612
• Published
• 81
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Paper
• 2505.10320
• Published
• 24
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
Paper
• 2506.08889
• Published
• 23
QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading
Paper
• 2509.09995
• Published
• 16
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Paper
• 2509.09674
• Published
• 80
Causal Attention with Lookahead Keys
Paper
• 2509.07301
• Published
• 21
A Survey of Reinforcement Learning for Large Reasoning Models
Paper
• 2509.08827
• Published
• 190
Why Low-Precision Transformer Training Fails: An Analysis on Flash
Attention
Paper
• 2510.04212
• Published
• 26
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for
Generalist Robot Policy
Paper
• 2510.13778
• Published
• 17
Direct Multi-Token Decoding
Paper
• 2510.11958
• Published
• 9
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning
for LLMs
Paper
• 2510.11696
• Published
• 181
Attention Is All You Need for KV Cache in Diffusion LLMs
Paper
• 2510.14973
• Published
• 42
INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization
Formats
Paper
• 2510.25602
• Published
• 78
Continuous Autoregressive Language Models
Paper
• 2510.27688
• Published
• 73
Motif 2 12.7B technical report
Paper
• 2511.07464
• Published
• 39
TiDAR: Think in Diffusion, Talk in Autoregression
Paper
• 2511.08923
• Published
• 128