harnessRL

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

FlippyDora submitted a paper 3 days ago

Predictive Divergence Masks for LLM RL

zhouxiangxin authored a paper 5 days ago

Stale but Stable: Staleness-Adaptive Trust Regions for Stabilizing Asynchronous Reinforcement Learning

zhouxiangxin authored a paper 5 days ago

MeanFlowNFT: Bringing Forward-Process RL to Average-Velocity Generators

View all activity

Jiaqi-hkust

posted an update 3 days ago

Post

3015

IQA-T1: Evidence‑Based Image Quality Assessment with MLLMs

Most MLLMs are blind to low‑level degradations—noise, blur, compression artifacts look the same as clean images in their internal representations. That leads to quality scores based on semantic “gut feeling” rather than real perceptual evidence.

IQA-T1 changes that. We equip the model with a toolbox of 15 perceptual tools (noise residual maps, Fourier spectra, gradient maps, etc.) that generate structured visual evidence on demand. The model learns how to use tools via supervised fine‑tuning on our Q‑Tool dataset (11k evidence‑grounded reasoning chains), and when to call them via GRPO reinforcement learning that balances accuracy, tool count, and redundancy.

The result: SOTA performance across 7 benchmarks (avg PLCC 0.795), using only 2.34 tools per image on average. Every predicted score is now interpretable and backed by hard visual evidence.

All code, weights, dataset, and demo are open. Check them out and give it a spin!

📄 arxiv.org/abs/2607.12375v1
💻 github.com/zibuyu-02/IQA-T1
🤗 model/data: huggingface.co/zibuyu-02/IQA-T1
🎮 demo: huggingface.co/spaces/Jiaqi-hkust/IQA-T1

FlippyDora

submitted a paper to Daily Papers 3 days ago

Predictive Divergence Masks for LLM RL

Paper • 2607.10848 • Published 15 days ago • 9

zhouxiangxin

authored 11 papers 5 days ago

Stale but Stable: Staleness-Adaptive Trust Regions for Stabilizing Asynchronous Reinforcement Learning

Paper • 2607.18722 • Published 6 days ago • 35

MeanFlowNFT: Bringing Forward-Process RL to Average-Velocity Generators

Paper • 2607.15273 • Published 11 days ago • 17

TempAct: Advancing Temporal Plausibility in Autoregressive Video Generation via Planner-Executor RL

Paper • 2606.28016 • Published Jun 26 • 1

Modeling All-Atom Glycan Structures via Hierarchical Message Passing and Multi-Scale Pre-training

Paper • 2506.01376 • Published Jun 2, 2025

OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use

Paper • 2508.04482 • Published Aug 6, 2025 • 10

DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization

Paper • 2403.13829 • Published Mar 7, 2024

Global Sparse Momentum SGD for Pruning Very Deep Neural Networks

Paper • 1909.12778 • Published Sep 27, 2019 • 1

GSLB: The Graph Structure Learning Benchmark

Paper • 2310.05174 • Published Oct 8, 2023

FlippyDora

updated a dataset 13 days ago

harnessRL/opengame-baselines

Updated 13 days ago • 18

FlippyDora

published a dataset 18 days ago

harnessRL/opengame-baselines

Updated 13 days ago • 18

Jiaqi-hkust

authored 2 papers about 1 month ago

Adaptive Debiasing Tsallis Entropy for Test-Time Adaptation

Paper • 2602.11743 • Published Feb 12

Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?

Paper • 2606.08063 • Published Jun 6 • 82

Jiaqi-hkust

posted an update about 1 month ago

Post

3992

🚀 Introducing Robust-U1: Teaching MLLMs to Self-Recover Corrupted Visual Content

Multimodal Large Language Models (MLLMs) have achieved impressive visual understanding, yet they remain highly brittle under real-world corruptions—noise, blur, compression artifacts, adverse weather.

Standard MLLMs suffer dramatic performance drops, and existing robustness solutions come with fundamental limits: black‑box feature alignment lacks interpretability, while white‑box text reasoning cannot restore the lost pixel‑level visual details. This raises a crucial question:

🧐 Can MLLMs recover corrupted visual content by themselves?

If the answer is yes, we can move beyond merely “compensating” for corruption and instead build a more intrinsic, generalizable form of resilience. Robust-U1 is our answer to that question.

💡 Paper: https://arxiv.org/abs/2606.08063
🔗 Code: github.com/jqtangust/Robust-U1
🌍 Demo: Jiaqi-hkust/Robust-U1

1 reply

Jiaqi-hkust

submitted a paper to Daily Papers about 1 month ago

Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?

Paper • 2606.08063 • Published Jun 6 • 82

zhouxiangxin

authored a paper about 2 months ago

Rethinking the Divergence Regularization in LLM RL

Paper • 2606.09821 • Published Jun 8 • 34

AI & ML interests

Recent Activity

Team members 10

harnessRL's activity