7 15 7

yilong xu

sapphirex

AI & ML interests

None yet

Recent Activity

upvoted a paper 9 days ago

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

upvoted a paper 15 days ago

Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning

upvoted a paper 18 days ago

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

View all activity

Organizations

upvoted a paper 9 days ago

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

Paper • 2606.17682 • Published 12 days ago • 26

upvoted a paper 15 days ago

Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning

Paper • 2606.13106 • Published 17 days ago • 21

upvoted a paper 18 days ago

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Paper • 2606.11052 • Published 19 days ago • 16

upvoted a paper 24 days ago

MemTrain: Self-Supervised Context Memory Training

Paper • 2606.03197 • Published 26 days ago • 17

upvoted a paper 25 days ago

Trust Region On-Policy Distillation

Paper • 2606.01249 • Published 28 days ago • 45

upvoted a paper about 1 month ago

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

Paper • 2605.18703 • Published May 18 • 50

upvoted a paper 4 months ago

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

Paper • 2602.17684 • Published Feb 4 • 22

upvoted a paper 5 months ago

MeKi: Memory-based Expert Knowledge Injection for Efficient LLM Scaling

Paper • 2602.03359 • Published Feb 3 • 10

upvoted a collection 8 months ago

Annotation-Efficient Universal Honesty Alignment

Collection

Official Collections of paper "Annotation-Efficient Universal Honesty Alignment". • 5 items • Updated Oct 21, 2025 • 3

upvoted a paper 8 months ago

Annotation-Efficient Universal Honesty Alignment

Paper • 2510.17509 • Published Oct 20, 2025 • 22

upvoted 2 papers 11 months ago

Training a Utility-based Retriever Through Shared Context Attribution for Retrieval-Augmented Language Models

Paper • 2504.00573 • Published Apr 1, 2025 • 2

RAVine: Reality-Aligned Evaluation for Agentic Search

Paper • 2507.16725 • Published Jul 22, 2025 • 31

upvoted a collection 11 months ago

MiniCPM4

Collection

MiniCPM4: Ultra-Efficient LLMs on End Devices • 30 items • Updated May 24 • 84

upvoted a paper 12 months ago

RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs

Paper • 2507.03253 • Published Jul 4, 2025 • 19

upvoted a collection over 1 year ago

BGE

Collection

31 items • Updated Feb 4 • 164

yilong xu

AI & ML interests

Recent Activity

Organizations

sapphirex's activity