drjiang

community

https://www.google.com/

AI & ML interests

None defined yet.

Recent Activity

zhiqiulin submitted a paper 2 months ago

Building a Precise Video Language with Human-AI Oversight

djiang04 authored a paper about 1 year ago

Towards Understanding Camera Motions in Any Video

zhiqiulin authored a paper about 1 year ago

Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers

View all activity

posted an update 17 days ago

Post

116

🚀 VQAScore now supports text-to-video evaluation!

VQAScore scores how well a generated image or video matches a prompt by asking a VLM "does this show {prompt}?" and using P(Yes). It became a go-to evaluation metric and reward model for image generation (2M+ downloads), and we just added text-to-video support across 20+ VLMs (GPT, Gemini, Qwen). Free and open-source, and it keeps improving as VLMs improve.

💻 Code: https://github.com/linzhiqiu/t2v_metrics
📄 Paper: https://arxiv.org/abs/2404.01291
🧵 Launch thread + demo video: https://x.com/ZhiqiuLin/status/2064316582461841499

1 reply

·

submitted a paper to Daily Papers 2 months ago

Building a Precise Video Language with Human-AI Oversight

Paper • 2604.21718 • Published Apr 22 • 17

authored a paper about 1 year ago

Towards Understanding Camera Motions in Any Video

Paper • 2504.15376 • Published Apr 21, 2025 • 157

authored 2 papers about 1 year ago

Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers

Paper • 2412.00142 • Published Nov 28, 2024 • 5

Towards Understanding Camera Motions in Any Video

Paper • 2504.15376 • Published Apr 21, 2025 • 157

authored a paper over 1 year ago

NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples

Paper • 2410.14669 • Published Oct 18, 2024 • 39

authored a paper over 1 year ago

NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples

Paper • 2410.14669 • Published Oct 18, 2024 • 39

authored a paper almost 2 years ago

GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation

Paper • 2406.13743 • Published Jun 19, 2024 • 2

authored 6 papers about 2 years ago

VisualGPTScore: Visio-Linguistic Reasoning with Multimodal Generative Pre-Training Scores

Paper • 2306.01879 • Published Jun 2, 2023 • 1

Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models

Paper • 2301.06267 • Published Jan 16, 2023

Language Models as Black-Box Optimizers for Vision-Language Models

Paper • 2309.05950 • Published Sep 12, 2023 • 4

The Neglected Tails of Vision-Language Models

Paper • 2401.12425 • Published Jan 23, 2024 • 3

Evaluating Text-to-Visual Generation with Image-to-Text Generation

Paper • 2404.01291 • Published Apr 1, 2024 • 6

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 91