Mian Zhang

billmianz

13 20 9

AI & ML interests

None yet

Recent Activity

upvoted a paper 3 days ago

OSWorld2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks

updated a model about 1 month ago

billmianz/stage2_gz32_linear_base_pairwise_alpha2_qwen7b_sft_mh_100_step300

published a model about 1 month ago

billmianz/stage2_gz32_linear_base_pairwise_alpha2_qwen7b_sft_mh_100_step300

View all activity

Organizations

upvoted a paper 3 days ago

OSWorld2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks

Paper • 2606.29537 • Published 5 days ago • 18

upvoted a paper 3 months ago

Terminal Agents Suffice for Enterprise Automation

Paper • 2604.00073 • Published Mar 31 • 98

upvoted a paper 4 months ago

On Data Engineering for Scaling LLM Terminal Capabilities

Paper • 2602.21193 • Published Feb 24 • 103

upvoted 2 papers 5 months ago

Self-Distillation Enables Continual Learning

Paper • 2601.19897 • Published Jan 27 • 41

Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge

Paper • 2601.08808 • Published Jan 13 • 39

upvoted a paper 9 months ago

Self-Improvement in Multimodal Large Language Models: A Survey

Paper • 2510.02665 • Published Oct 3, 2025 • 21

upvoted 4 papers 10 months ago

LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

Paper • 2508.15760 • Published Aug 21, 2025 • 47

upvoted a paper 11 months ago

Complex Logical Instruction Generation

Paper • 2508.09125 • Published Aug 12, 2025 • 40

upvoted 2 papers about 1 year ago

Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation

Paper • 2506.01565 • Published Jun 2, 2025 • 3

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper • 2504.20571 • Published Apr 29, 2025 • 99

upvoted 3 papers over 1 year ago

Preference Learning Unlocks LLMs' Psycho-Counseling Skills

Paper • 2502.19731 • Published Feb 27, 2025 • 8

Humanity's Last Exam

Paper • 2501.14249 • Published Jan 24, 2025 • 77

CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy

Paper • 2410.13218 • Published Oct 17, 2024 • 5

upvoted 3 papers almost 2 years ago

Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Paper • 2409.08264 • Published Sep 12, 2024 • 48

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

Paper • 2409.04109 • Published Sep 6, 2024 • 48

Learning to Refuse: Towards Mitigating Privacy Risks in LLMs

Paper • 2407.10058 • Published Jul 14, 2024 • 31

upvoted a paper over 2 years ago

UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations

Paper • 2311.08469 • Published Nov 14, 2023 • 11

Mian Zhang

AI & ML interests

Recent Activity

Organizations

billmianz's activity