11 9

Hang

hhua1

AI & ML interests

None yet

Recent Activity

upvoted a paper 5 days ago

MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models

upvoted a paper 7 days ago

Aligning Quantum Operators with Large Language Models

upvoted a paper 20 days ago

AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?

View all activity

Organizations

upvoted a paper 5 days ago

MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models

Paper • 2410.09733 • Published Oct 13, 2024 • 9

upvoted a paper 7 days ago

Aligning Quantum Operators with Large Language Models

Paper • 2606.13811 • Published 14 days ago • 4

upvoted a paper 20 days ago

AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?

Paper • 2606.05080 • Published 22 days ago • 30

upvoted a paper 23 days ago

Agent Skills Should Go Beyond Text: The Case for Visual Skills

Paper • 2606.01414 • Published 25 days ago • 10

liked a model about 1 month ago

tifa-benchmark/promptcap-coco-vqa

Image-to-Text • Updated Dec 11, 2023 • 49 • 15

upvoted 2 papers about 1 month ago

MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents

Paper • 2605.18652 • Published May 18 • 8

Aurora: Unified Video Editing with a Tool-Using Agent

Paper • 2605.18748 • Published May 18 • 29

upvoted an article 3 months ago

Article

Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents

ibm-granite

•

Mar 31

• 34

authored a paper 7 months ago

MIRA: Multimodal Iterative Reasoning Agent for Image Editing

Paper • 2511.21087 • Published Nov 26, 2025 • 10

upvoted a paper 7 months ago

MIRA: Multimodal Iterative Reasoning Agent for Image Editing

Paper • 2511.21087 • Published Nov 26, 2025 • 10

upvoted a paper 8 months ago

Latent Chain-of-Thought for Visual Reasoning

Paper • 2510.23925 • Published Oct 27, 2025 • 10

liked 2 models 9 months ago

hhua2/V2Xum-LLM

Robotics • Updated Sep 18, 2025 • 2

hhua2/finecaption

Updated Jun 16, 2025 • 1

liked 3 datasets 9 months ago

upvoted a paper 9 months ago

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

Paper • 2510.05034 • Published Oct 6, 2025 • 51

liked 3 models about 1 year ago

ibm-granite/granite-vision-3.3-2b

Image-to-Text • 3B • Updated Apr 2 • 146k • 85

ibm-granite/granite-vision-3.1-2b-preview

Image-Text-to-Text • 3B • Updated Jun 12, 2025 • 892 • 114

ibm-granite/granite-vision-3.2-2b

Image-Text-to-Text • 3B • Updated Apr 2 • 3.8k • 123

Hang

AI & ML interests

Recent Activity

Organizations

hhua1's activity

Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents