Kaixin Li

likaixin

9 29 20

https://likaixin2000.github.io/

likaixin2000

AI & ML interests

None yet

Recent Activity

upvoted a paper 14 days ago

GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?

upvoted a paper 20 days ago

TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders

upvoted a paper 21 days ago

Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories

View all activity

Organizations

upvoted a paper 14 days ago

GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?

Paper • 2606.17861 • Published 15 days ago • 58

upvoted a paper 20 days ago

TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders

Paper • 2606.09323 • Published 23 days ago • 53

upvoted a paper 21 days ago

Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories

Paper • 2606.11176 • Published 22 days ago • 127

upvoted a paper about 2 months ago

From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms

Paper • 2605.06716 • Published May 7 • 5

upvoted 2 papers 3 months ago

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents

Paper • 2604.07429 • Published Apr 8 • 123

CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents

Paper • 2603.24440 • Published Mar 25 • 99

upvoted 2 papers 4 months ago

EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings

Paper • 2603.13594 • Published Mar 13 • 149

Qwen3-Coder-Next Technical Report

Paper • 2603.00729 • Published Feb 28 • 65

upvoted a collection 4 months ago

Qwen3.5

Collection

21 items • Updated Mar 9 • 1.7k

upvoted a collection 5 months ago

Qwen3-VL

Collection

37 items • Updated Dec 31, 2025 • 750

upvoted 2 papers 5 months ago

OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Paper • 2602.05400 • Published Feb 5 • 356

Grounding and Enhancing Informativeness and Utility in Dataset Distillation

Paper • 2601.21296 • Published Jan 29 • 21

upvoted a paper 6 months ago

Step-GUI Technical Report

Paper • 2512.15431 • Published Dec 17, 2025 • 134

upvoted 2 papers 7 months ago

Qwen3-VL Technical Report

Paper • 2511.21631 • Published Nov 26, 2025 • 164

MM-CRITIC: A Holistic Evaluation of Large Multimodal Models as Multimodal Critique

Paper • 2511.09067 • Published Nov 12, 2025 • 2

upvoted 2 papers 8 months ago

Grounding Computer Use Agents on Human Demonstrations

Paper • 2511.07332 • Published Nov 10, 2025 • 107

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

Paper • 2511.02778 • Published Nov 4, 2025 • 104

upvoted 3 papers 9 months ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Paper • 2510.08697 • Published Oct 9, 2025 • 40

MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems

Paper • 2404.09486 • Published Apr 15, 2024 • 2

AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness

Paper • 2507.01702 • Published Jul 2, 2025 • 4

Kaixin Li

AI & ML interests

Recent Activity

Organizations

likaixin's activity