Trust the Model: Compact VLMs as In-Context Judges for Image-Text Data Quality Paper • 2507.20156 • Published Jul 27, 2025
OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection Paper • 2502.20361 • Published Feb 27, 2025 • 1
$β$-CLIP: Text-Conditioned Contrastive Learning for Multi-Granular Vision-Language Alignment Paper • 2512.12678 • Published Dec 14, 2025
FloorplanQA: A Benchmark for Spatial Reasoning in LLMs using Structured Representations Paper • 2507.07644 • Published Jul 10, 2025 • 4
Mindstorms in Natural Language-Based Societies of Mind Paper • 2305.17066 • Published May 26, 2023 • 3
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models Paper • 2404.02747 • Published Apr 3, 2024 • 13
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale Paper • 2410.20280 • Published Oct 26, 2024 • 23
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer Paper • 2509.24695 • Published Sep 29, 2025 • 45
AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models Paper • 2511.14295 • Published Nov 18, 2025 • 72
Mindstorms in Natural Language-Based Societies of Mind Paper • 2305.17066 • Published May 26, 2023 • 3
Mindstorms in Natural Language-Based Societies of Mind Paper • 2305.17066 • Published May 26, 2023 • 3
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention Paper • 2312.07987 • Published Dec 13, 2023 • 41