20 27

Xiangyu

xixy

https://xixy.github.io/

AI & ML interests

None yet

Recent Activity

upvoted a paper 16 days ago

FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents

commentedon a paper about 1 month ago

SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills

upvoted a paper about 1 month ago

SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills

View all activity

Organizations

None yet

upvoted a paper 16 days ago

FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents

Paper • 2606.12087 • Published 18 days ago • 77

commented a paper about 1 month ago

SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills

Paper • 2605.24117 • Published May 22 • 22 •

upvoted a paper about 1 month ago

SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills

Paper • 2605.24117 • Published May 22 • 22

upvoted 4 papers about 2 months ago

TMAS: Scaling Test-Time Compute via Multi-Agent Synergy

Paper • 2605.10344 • Published May 11 • 51

Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies

Paper • 2605.03596 • Published May 5 • 11

HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness

Paper • 2605.02396 • Published May 4 • 24

ClawGym: A Scalable Framework for Building Effective Claw Agents

Paper • 2604.26904 • Published Apr 29 • 54

authored a paper 3 months ago

On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning

Paper • 2604.01702 • Published Apr 4 • 3

upvoted 2 papers 3 months ago

How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings

Paper • 2604.04323 • Published Apr 6 • 41

On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning

Paper • 2604.01702 • Published Apr 4 • 3

commented a paper 3 months ago

Embarrassingly Simple Self-Distillation Improves Code Generation

Paper • 2604.01193 • Published Apr 1 • 56 •

authored a paper 3 months ago

LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning

Paper • 2603.21065 • Published Mar 22 • 78

New activity in Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled 4 months ago

Claude distillation

❤️➕ 2

#1 opened 4 months ago by

gergopool

upvoted a paper 4 months ago

How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities

Paper • 2603.02578 • Published Mar 3 • 25

authored 6 papers 5 months ago

Xiangyu

AI & ML interests

Recent Activity

Organizations

xixy's activity

Claude distillation