19 33 31

Zhaokai Wang

wzk1015

https://www.wzk.plus

wzk1015

AI & ML interests

Computer Vision Music Generation Multimodal Large Language Models

Recent Activity

upvoted a paper 10 days ago

ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining

upvoted a paper 11 days ago

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

new activity 20 days ago

OpenGVLab/Mono-InternVL-2B:Fix remaining Transformers v5 crash: guard llm_config and to_dict() for None (follow-up to `e980c02`)

View all activity

Organizations

upvoted a paper 10 days ago

ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining

Paper • 2606.17200 • Published 13 days ago • 51

upvoted a paper 11 days ago

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

Paper • 2606.17043 • Published 13 days ago • 10

New activity in OpenGVLab/Mono-InternVL-2B 20 days ago

Fix remaining Transformers v5 crash: guard llm_config and to_dict() for None (follow-up to `e980c02`)

#13 opened 20 days ago by

KBayoud

Fix KeyError in init when vision_config is empty (Transformers v5 compatibility)

#12 opened 20 days ago by

KBayoud

liked 2 datasets 3 months ago

VisionXLab/GRADE

Viewer • Updated Apr 22 • 1.04k • 757 • 7

InternVL-U/ScaleEdit-12M

Viewer • Updated 4 days ago • 12.4M • 5.79k • 20

authored a paper 4 months ago

GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing

Paper • 2603.12264 • Published Mar 12 • 15

upvoted a paper 4 months ago

GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing

Paper • 2603.12264 • Published Mar 12 • 15

liked a dataset 4 months ago

opencompass/TextEdit

Viewer • Updated Mar 15 • 2.15k • 384 • 9

upvoted a paper 4 months ago

InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing

Paper • 2603.09877 • Published Mar 10 • 49

authored a paper 4 months ago

InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing

Paper • 2603.09877 • Published Mar 10 • 49

liked a model 4 months ago

InternVL-U/InternVL-U

Any-to-Any • Updated Mar 15 • 79 • 58

liked a dataset 5 months ago

simon123905/ICLR

Updated Jan 17 • 25 • 4

updated a dataset 7 months ago

wzk1015/GenExam-Gen-Images

Preview • Updated Dec 10, 2025 • 3

published a dataset 7 months ago

wzk1015/GenExam-Gen-Images

Preview • Updated Dec 10, 2025 • 3

upvoted a paper 7 months ago

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

Paper • 2512.05965 • Published Dec 5, 2025 • 38

liked a model 8 months ago

Zhenxin-Lei/MetaCaptioner

9B • Updated Oct 23, 2025 • 3 • 5

upvoted a paper 8 months ago

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published Oct 17, 2025 • 93

upvoted 2 papers 9 months ago

NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints

Paper • 2510.08565 • Published Oct 9, 2025 • 22

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

Paper • 2510.08540 • Published Oct 9, 2025 • 110