Xiangyi Li's picture

Xiangyi Li PRO

xdotli

·

https://www.xiangyi.li

AI & ML interests

None yet

Recent Activity

updated a model 2 days ago

benchflow/benchflow-qwen35-9b

updated a dataset 3 days ago

benchflow/env0-experiment-trajectories

published a model 4 days ago

benchflow/benchflow-qwen35-9b

View all activity

Organizations

upvoted an article about 1 month ago

Article

Exploring Environments Hub: Your Language Model needs better (open) environments to learn

anakin87

•

Sep 4, 2025

• 31

upvoted an article 2 months ago

Article

Context Engineering & Reuse Pattern Under the Hood of Claude Code

kobe0938

•

Dec 22, 2025

• 7

upvoted 3 papers 3 months ago

Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills

Paper • 2604.05333 • Published Apr 7 • 23

ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces

Paper • 2604.05172 • Published Apr 6 • 24

RubricBench: Aligning Model-Generated Rubrics with Human Standards

Paper • 2603.01562 • Published Mar 2 • 64

upvoted 4 papers 4 months ago

MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents

Paper • 2603.09827 • Published Mar 10 • 30

Flash-KMeans: Fast and Memory-Efficient Exact K-Means

Paper • 2603.09229 • Published Mar 10 • 84

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Paper • 2601.11868 • Published Jan 17 • 37

StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets?

Paper • 2510.02209 • Published Oct 2, 2025 • 57

upvoted a collection 4 months ago

SkillsBench

1 item • Updated Feb 17 • 1

upvoted a paper 4 months ago

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

Paper • 2602.12670 • Published Feb 13 • 62

upvoted 2 papers over 1 year ago

Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong

Paper • 2501.09775 • Published Jan 16, 2025 • 32

HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs

Paper • 2503.02003 • Published Mar 3, 2025 • 47