toread - a eva0071 Collection

eva0071 's Collections

toread

updated 1 day ago

Why Fine-Tuning Encourages Hallucinations and How to Fix It

Paper • 2604.15574 • Published Apr 16 • 25
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

Paper • 2604.24763 • Published Apr 27 • 71
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora

Paper • 2604.24819 • Published Apr 27 • 91
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

Paper • 2604.26752 • Published Apr 29 • 112
Large Language Models Explore by Latent Distilling

Paper • 2604.24927 • Published Apr 27 • 74
Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding

Paper • 2604.26779 • Published Apr 29 • 14
Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Paper • 2605.22791 • Published May 21 • 33
Unsupervised Process Reward Models

Paper • 2605.10158 • Published May 11 • 27
Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

Paper • 2605.16928 • Published May 16 • 97
From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models

Paper • 2605.20177 • Published May 19 • 10
Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

Paper • 2605.19282 • Published May 19 • 9
Channel-wise Vector Quantization

Paper • 2605.26089 • Published May 25 • 15
Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models

Paper • 2605.26895 • Published May 26 • 20
Task-Focused Memorization for Multimodal Agents

Paper • 2605.31075 • Published 29 days ago • 39
LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

Paper • 2605.31584 • Published 29 days ago • 43
Not All Disagreement Is Learnable: Token Teachability in On-Policy Distillation

Paper • 2605.26844 • Published May 26 • 26
ESPO: Early-Stopping Proximal Policy Optimization

Paper • 2605.29860 • Published 30 days ago • 20
NITP: Next Implicit Token Prediction for LLM Pre-training

Paper • 2605.24956 • Published May 24 • 35
Self-Distilled Policy Gradient

Paper • 2606.04036 • Published 25 days ago • 27
MemTrain: Self-Supervised Context Memory Training

Paper • 2606.03197 • Published 25 days ago • 17
Latent Reasoning with Normalizing Flows

Paper • 2606.06447 • Published 23 days ago • 8
ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning

Paper • 2606.03503 • Published 24 days ago • 25
Unified Neural Scaling Laws

Paper • 2605.26248 • Published May 25 • 7
Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

Paper • 2606.07502 • Published 22 days ago • 97
On the Geometry of On-Policy Distillation

Paper • 2606.07082 • Published 22 days ago • 73
Dynamic Linear Attention

Paper • 2606.10650 • Published 18 days ago • 5
How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs

Paper • 2606.10646 • Published 18 days ago • 6
Rethinking the Divergence Regularization in LLM RL

Paper • 2606.09821 • Published 19 days ago • 33
Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Paper • 2606.12397 • Published 17 days ago • 87
MiniMax Sparse Attention

Paper • 2606.13392 • Published 16 days ago • 146
Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Paper • 2606.15007 • Published 15 days ago • 16
Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories

Paper • 2606.11176 • Published 18 days ago • 126
Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling

Paper • 2606.12370 • Published 17 days ago • 21
Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale

Paper • 2606.15079 • Published 14 days ago • 84
ExpRL: Exploratory RL for LLM Mid-Training

Paper • 2606.17024 • Published 12 days ago • 5
Variable-Width Transformers

Paper • 2606.18246 • Published 11 days ago • 15
Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

Paper • 2606.18216 • Published 11 days ago • 61
Rethinking the Role of Efficient Attention in Hybrid Architectures

Paper • 2606.15378 • Published 14 days ago • 17
Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models

Paper • 2606.19750 • Published 9 days ago • 3
Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning

Paper • 2606.18831 • Published 10 days ago • 6
The Hitchhiker's Guide to Agentic AI: From Foundations to Systems

Paper • 2606.24937 • Published 5 days ago • 13
Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning

Paper • 2606.24133 • Published 4 days ago • 8