9 29 20

Kaixin Li

likaixin

https://likaixin2000.github.io/

likaixin2000

AI & ML interests

None yet

Recent Activity

upvoted a paper 9 days ago

GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?

upvoted a paper 15 days ago

TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders

upvoted a paper 16 days ago

Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories

View all activity

Organizations

upvoted a paper 9 days ago

GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?

Paper • 2606.17861 • Published 11 days ago • 56

upvoted a paper 15 days ago

TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders

Paper • 2606.09323 • Published 19 days ago • 51

upvoted a paper 16 days ago

Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories

Paper • 2606.11176 • Published 18 days ago • 126

New activity in likaixin/ScreenSpot-Pro about 1 month ago

testing

#4 opened about 1 month ago by

amitdsxpert

upvoted a paper about 2 months ago

From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms

Paper • 2605.06716 • Published May 7 • 5

updated a dataset 2 months ago

likaixin/MMCode

Viewer • Updated Apr 16 • 263 • 140 • 13

upvoted a paper 2 months ago

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents

Paper • 2604.07429 • Published Apr 8 • 123

authored a paper 3 months ago

CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents

Paper • 2603.24440 • Published Mar 25 • 99

upvoted a paper 3 months ago

CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents

Paper • 2603.24440 • Published Mar 25 • 99

New activity in likaixin/ScreenSpot-Pro 3 months ago

Upload eval.yaml

#3 opened 3 months ago by

merve

upvoted a paper 3 months ago

EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings

Paper • 2603.13594 • Published Mar 13 • 149

liked a dataset 3 months ago

stepfun-ai/Step-3.5-Flash-SFT

Viewer • Updated Mar 14 • 1.62M • 5.1k • 339

upvoted a paper 4 months ago

Qwen3-Coder-Next Technical Report

Paper • 2603.00729 • Published Feb 28 • 65

updated a dataset 4 months ago

likaixin/ScreenSpot-Pro

Benchmark • Updated Mar 18 • 10.2k • 67

upvoted a collection 4 months ago

Qwen3.5

Collection

21 items • Updated Mar 9 • 1.69k

liked a model 4 months ago

Qwen/Qwen3.5-397B-A17B

Image-Text-to-Text • 403B • Updated Apr 24 • 557k • • 1.52k

upvoted a collection 4 months ago

Qwen3-VL

Collection

37 items • Updated Dec 31, 2025 • 747

upvoted a paper 5 months ago

OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Paper • 2602.05400 • Published Feb 5 • 356

authored a paper 5 months ago

Grounding and Enhancing Informativeness and Utility in Dataset Distillation

Paper • 2601.21296 • Published Jan 29 • 21