cool
updated
Sparse Autoencoders Learn Monosemantic Features in Vision-Language
Models
Paper
• 2504.02821
• Published
• 10
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming
Videos
Paper
• 2504.17343
• Published
• 13
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting
Paper
• 2504.15921
• Published
• 7
Causal-Copilot: An Autonomous Causal Analysis Agent
Paper
• 2504.13263
• Published
• 7
Distilling semantically aware orders for autoregressive image generation
Paper
• 2504.17069
• Published
• 7
VideoDeepResearch: Long Video Understanding With Agentic Tool Using
Paper
• 2506.10821
• Published
• 19
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video
Diffusion Models
Paper
• 2506.07177
• Published
• 23
lym00/Wan2.2_T2V_A14B_VACE-test
17B • Updated
• 1.29k
• 42
Hyper-Bagel: A Unified Acceleration Framework for Multimodal
Understanding and Generation
Paper
• 2509.18824
• Published
• 23
SANA-Video: Efficient Video Generation with Block Linear Diffusion
Transformer
Paper
• 2509.24695
• Published
• 46
DC-VideoGen: Efficient Video Generation with Deep Compression Video
Autoencoder
Paper
• 2509.25182
• Published
• 39
lovis93/next-scene-qwen-image-lora-2509
Image-to-Image
• Updated
• 34.6k
• • 589
Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence
Reweighting
Paper
• 2510.08696
• Published
• 15
TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion
Control
Paper
• 2510.09561
• Published
• 9
Video-to-Video
• Updated
• 177
Video-to-Video
• Updated
• 75
meituan-longcat/LongCat-Video
Text-to-Video
• Updated
• 860
• • 445
Text-to-Video
• Updated
• 447
• 275
TencentARC/RollingForcing
Text-to-Video
• Updated
• 15
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal
Perception and Generation
Paper
• 2510.24821
• Published
• 41
Scaling Latent Reasoning via Looped Language Models
Paper
• 2510.25741
• Published
• 229