cool
updated
Sparse Autoencoders Learn Monosemantic Features in Vision-Language
Models
Paper
•
2504.02821
•
Published
•
9
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming
Videos
Paper
•
2504.17343
•
Published
•
13
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting
Paper
•
2504.15921
•
Published
•
7
Causal-Copilot: An Autonomous Causal Analysis Agent
Paper
•
2504.13263
•
Published
•
7
Distilling semantically aware orders for autoregressive image generation
Paper
•
2504.17069
•
Published
•
7
VideoDeepResearch: Long Video Understanding With Agentic Tool Using
Paper
•
2506.10821
•
Published
•
19
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video
Diffusion Models
Paper
•
2506.07177
•
Published
•
23
lym00/Wan2.2_T2V_A14B_VACE-test
17B
•
Updated
•
2.38k
•
41
Hyper-Bagel: A Unified Acceleration Framework for Multimodal
Understanding and Generation
Paper
•
2509.18824
•
Published
•
23
SANA-Video: Efficient Video Generation with Block Linear Diffusion
Transformer
Paper
•
2509.24695
•
Published
•
44
DC-VideoGen: Efficient Video Generation with Deep Compression Video
Autoencoder
Paper
•
2509.25182
•
Published
•
37
lovis93/next-scene-qwen-image-lora-2509
Image-to-Image
•
Updated
•
50.7k
•
•
569
Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence
Reweighting
Paper
•
2510.08696
•
Published
•
15
TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion
Control
Paper
•
2510.09561
•
Published
•
8
Video-to-Video
•
Updated
•
169
Video-to-Video
•
Updated
•
73
meituan-longcat/LongCat-Video
Text-to-Video
•
Updated
•
1.68k
•
•
415
Text-to-Video
•
Updated
•
309
•
266
TencentARC/RollingForcing
Text-to-Video
•
Updated
•
15
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal
Perception and Generation
Paper
•
2510.24821
•
Published
•
39
Scaling Latent Reasoning via Looped Language Models
Paper
•
2510.25741
•
Published
•
223