26 87 17

Min-Hung Chen

cmhungsteve

https://minhungchen.netlify.app/

AI & ML interests

Multimodal AI, Transfer Learning, Unsupervised Learning, Video Understanding, Vision Transformer, Computer Vision, Deep Learning

Recent Activity

upvoted a paper about 12 hours ago

One Model, Many Latencies: Universal Speech Enhancement for Diverse Real-Time Applications

upvoted a paper 9 days ago

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

upvoted a collection 13 days ago

Cosmos3

View all activity

Organizations

upvoted a paper about 12 hours ago

One Model, Many Latencies: Universal Speech Enhancement for Diverse Real-Time Applications

Paper • 2606.25621 • Published 2 days ago • 12

upvoted a paper 9 days ago

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

Paper • 2606.18216 • Published 10 days ago • 61

upvoted a collection 13 days ago

Cosmos3

Collection

Omnimodal World Models for Physical AI • 16 items • Updated about 2 hours ago • 131

upvoted a paper 14 days ago

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Paper • 2606.13673 • Published 15 days ago • 106

upvoted 3 papers 21 days ago

FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

Paper • 2605.19846 • Published May 20 • 3

DVSM: Decoder-only View Synthesis Model Done Right

Paper • 2605.29891 • Published 29 days ago • 2

Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them

Paper • 2606.06361 • Published 22 days ago • 16

upvoted a paper 27 days ago

Why Far Looks Up: Probing Spatial Representation in Vision-Language Models

Paper • 2605.30161 • Published 29 days ago • 60

upvoted a paper 29 days ago

Agent Explorative Policy Optimization for Multimodal Agentic Reasoning

Paper • 2605.28774 • Published 30 days ago • 93

upvoted an article about 1 month ago

Article

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

nvidia

•

May 18

• 21

upvoted a paper 3 months ago

ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks

Paper • 2603.27862 • Published Mar 29 • 33

upvoted a paper 4 months ago

iGRPO: Self-Feedback-Driven LLM Reasoning

Paper • 2602.09000 • Published Feb 9 • 19

upvoted a paper 5 months ago

TIPO: Text to Image with Text Presampling for Prompt Optimization

Paper • 2411.08127 • Published Nov 12, 2024 • 4

upvoted a collection 5 months ago

TIPO

Collection

Text to Image with text presampling for Prompt Optimization • 6 items • Updated Jan 22, 2025 • 6

upvoted 6 papers 5 months ago

Expected Harm: Rethinking Safety Evaluation of (Mis)Aligned LLMs

Paper • 2602.01600 • Published Feb 2 • 21

PaperBanana: Automating Academic Illustration for AI Scientists

Paper • 2601.23265 • Published Jan 30 • 229

GenRecal: Generation after Recalibration from Large to Small Vision-Language Models

Paper • 2506.15681 • Published Jun 18, 2025 • 43

Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting

Paper • 2512.20927 • Published Dec 24, 2025 • 17

OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding

Paper • 2601.09575 • Published Jan 14 • 26

Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning

Paper • 2601.09708 • Published Jan 14 • 56

Min-Hung Chen

AI & ML interests

Recent Activity

Organizations

cmhungsteve's activity

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation