LMD: Faster Image Reconstruction with Latent Masking Diffusion Paper • 2312.07971 • Published Dec 13, 2023
UltraMedical: Building Specialized Generalists in Biomedicine Paper • 2406.03949 • Published Jun 6, 2024
Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding Paper • 2407.09781 • Published Jul 13, 2024
Safe-SD: Safe and Traceable Stable Diffusion with Text Prompt Trigger for Invisible Generative Watermarking Paper • 2407.13188 • Published Jul 18, 2024
Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices Paper • 2410.11795 • Published Oct 15, 2024 • 18
DH-VTON: Deep Text-Driven Virtual Try-On via Hybrid Attention Learning Paper • 2410.12501 • Published Oct 16, 2024
VideoDirector: Precise Video Editing via Text-to-Video Models Paper • 2411.17592 • Published Nov 26, 2024
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10, 2025 • 193
Dream3DAvatar: Text-Controlled 3D Avatar Reconstruction from a Single Image Paper • 2509.13013 • Published Sep 16, 2025
Towards Cross-View Point Correspondence in Vision-Language Models Paper • 2512.04686 • Published Dec 4, 2025
Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach Paper • 2412.03017 • Published Dec 4, 2024
Fine-tuning BERT for Joint Entity and Relation Extraction in Chinese Medical Text Paper • 1908.07721 • Published Aug 21, 2019
CVE-Factory: Scaling Expert-Level Agentic Tasks for Code Security Vulnerability Paper • 2602.03012 • Published Feb 3 • 3
CRAFT: Calibrated Reasoning with Answer-Faithful Traces via Reinforcement Learning for Multi-Hop Question Answering Paper • 2602.01348 • Published Feb 1
One2Scene: Geometric Consistent Explorable 3D Scene Generation from a Single Image Paper • 2602.19766 • Published Feb 23
One-Step Effective Diffusion Network for Real-World Image Super-Resolution Paper • 2406.08177 • Published Oct 24, 2024
VideoAfford: Grounding 3D Affordance from Human-Object-Interaction Videos via Multimodal Large Language Model Paper • 2602.09638 • Published Feb 10
I2E: From Image Pixels to Actionable Interactive Environments for Text-Guided Image Editing Paper • 2601.03741 • Published Apr 7
Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models Paper • 2403.17589 • Published Mar 26, 2024 • 1