VQ-Seg: Vector-Quantized Token Perturbation for Semi-Supervised Medical Image Segmentation Paper • 2601.10124 • Published 15 days ago • 4
Making Dialogue Grounding Data Rich: A Three-Tier Data Synthesis Framework for Generalized Referring Expression Comprehension Paper • 2512.02791 • Published Dec 2, 2025 • 1
Seeing the Forest and the Trees: Query-Aware Tokenizer for Long-Video Multimodal Language Models Paper • 2511.11910 • Published Nov 14, 2025 • 35
Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications Paper • 2508.00669 • Published Aug 1, 2025
Training-free Subject-Enhanced Attention Guidance for Compositional Text-to-image Generation Paper • 2405.06948 • Published May 11, 2024
Polyp-Gen: Realistic and Diverse Polyp Image Generation for Endoscopic Dataset Expansion Paper • 2501.16679 • Published Jan 28, 2025
EndoBench: A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis Paper • 2505.23601 • Published May 29, 2025
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix Paper • 2505.13032 • Published May 19, 2025 • 3
CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following Paper • 2506.12285 • Published Jun 14, 2025 • 54
$μ^2$Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation Paper • 2507.00316 • Published Jun 30, 2025 • 15
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1, 2025 • 250
MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities Paper • 2412.04106 • Published Dec 4, 2024 • 5
PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents Paper • 2303.07240 • Published Mar 13, 2023
One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts Paper • 2312.17183 • Published Dec 28, 2023
RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining Paper • 2503.04653 • Published Mar 6, 2025
Rethinking Whole-Body CT Image Interpretation: An Abnormality-Centric Approach Paper • 2506.03238 • Published Jun 3, 2025 • 1
NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization Paper • 2502.14638 • Published Feb 20, 2025 • 11
A Study on the Performance of U-Net Modifications in Retroperitoneal Tumor Segmentation Paper • 2502.00314 • Published Feb 1, 2025 • 3
Generating Synthetic Computed Tomography for Radiotherapy: SynthRAD2023 Challenge Report Paper • 2403.08447 • Published Mar 13, 2024 • 2