20 27

Xiangyu

xixy

https://xixy.github.io/

AI & ML interests

None yet

Recent Activity

upvoted a paper 16 days ago

FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents

commentedon a paper about 1 month ago

SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills

upvoted a paper about 1 month ago

SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills

View all activity

Organizations

None yet

upvoted a paper 16 days ago

FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents

Paper • 2606.12087 • Published 18 days ago • 77

upvoted a paper about 1 month ago

SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills

Paper • 2605.24117 • Published May 22 • 22

upvoted 4 papers about 2 months ago

TMAS: Scaling Test-Time Compute via Multi-Agent Synergy

Paper • 2605.10344 • Published May 11 • 51

Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies

Paper • 2605.03596 • Published May 5 • 11

HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness

Paper • 2605.02396 • Published May 4 • 24

ClawGym: A Scalable Framework for Building Effective Claw Agents

Paper • 2604.26904 • Published Apr 29 • 54

upvoted 2 papers 3 months ago

How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings

Paper • 2604.04323 • Published Apr 6 • 41

On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning

Paper • 2604.01702 • Published Apr 4 • 3

upvoted a paper 4 months ago

How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities

Paper • 2603.02578 • Published Mar 3 • 25

upvoted 2 papers 5 months ago

LongCat-Flash-Thinking-2601 Technical Report

Paper • 2601.16725 • Published Jan 23 • 181

Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text

Paper • 2601.10355 • Published Jan 15 • 39

upvoted a paper 6 months ago

Universal Reasoning Model

Paper • 2512.14693 • Published Dec 16, 2025 • 44

upvoted a paper 8 months ago

DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation

Paper • 2511.06307 • Published Nov 9, 2025 • 53

upvoted an article 8 months ago

Article

Budget Alignment: Making Models Reason in the User’s Language

shanchen

•

Nov 4, 2025

• 12

upvoted 2 papers 8 months ago

AMO-Bench: Large Language Models Still Struggle in High School Math Competitions

Paper • 2510.26768 • Published Oct 30, 2025 • 36

VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications

Paper • 2509.26490 • Published Sep 30, 2025 • 21

upvoted a paper 9 months ago

R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?

Paper • 2510.08189 • Published Oct 9, 2025 • 28

upvoted a collection 11 months ago

OpenReasoning-Nemotron

Collection

Collection of models for OpenReasoning-Nemotron which are trained on 5M reasoning traces for Math, Code and Science. • 6 items • Updated 16 days ago • 47

upvoted 2 papers about 1 year ago

AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy

Paper • 2506.13284 • Published Jun 16, 2025 • 26

Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective

Paper • 2505.17652 • Published May 23, 2025 • 6

Xiangyu

AI & ML interests

Recent Activity

Organizations

xixy's activity

Budget Alignment: Making Models Reason in the User’s Language