ZhangZhihao's picture

13

ZhangZhihao

Zhangzzz1

·

AI & ML interests

None yet

Recent Activity

upvoted a paper about 1 month ago

LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening

upvoted a paper 5 months ago

MOVA: Towards Scalable and Synchronized Video-Audio Generation

upvoted a paper 5 months ago

Steering LLMs via Scalable Interactive Oversight

View all activity

Organizations

upvoted a paper about 1 month ago

LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening

Paper • 2605.19597 • Published May 19 • 21

upvoted 3 papers 5 months ago

MOVA: Towards Scalable and Synchronized Video-Audio Generation

Paper • 2602.08794 • Published Feb 9 • 159

Steering LLMs via Scalable Interactive Oversight

Paper • 2602.04210 • Published Feb 4 • 20

Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation

Paper • 2602.03619 • Published Feb 3 • 28

authored a paper 5 months ago

Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models

Paper • 2601.14004 • Published Jan 20 • 49

upvoted a collection 5 months ago

AgentDoG

A Diagnostic Guardrail Framework for AI Agent Safety and Security • 12 items • Updated 6 days ago • 112

upvoted 4 papers 5 months ago

TL-GRPO: Turn-Level RL for Reasoning-Guided Iterative Optimization

Paper • 2601.16480 • Published Jan 23 • 50

HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

Paper • 2601.14724 • Published Jan 21 • 75

Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment

Paper • 2601.14249 • Published Jan 20 • 14

Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models

Paper • 2601.14004 • Published Jan 20 • 49

upvoted a paper 7 months ago

Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction

Paper • 2512.04987 • Published Dec 4, 2025 • 85

upvoted a paper 8 months ago

BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping

Paper • 2510.18927 • Published Oct 21, 2025 • 85

upvoted 2 papers 12 months ago

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

Paper • 2507.10532 • Published Jul 14, 2025 • 90

BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset

Paper • 2507.03483 • Published Jul 4, 2025 • 24