cool - a EladofWar Collection

EladofWar 's Collections

samsegmentation

fast-text-to-image

cool

updated Oct 30, 2025

Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models

Paper • 2504.02821 • Published Apr 3, 2025 • 10
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

Paper • 2504.17343 • Published Apr 24, 2025 • 13
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting

Paper • 2504.15921 • Published Apr 22, 2025 • 7
Causal-Copilot: An Autonomous Causal Analysis Agent

Paper • 2504.13263 • Published Apr 17, 2025 • 7
Distilling semantically aware orders for autoregressive image generation

Paper • 2504.17069 • Published Apr 23, 2025 • 7
VideoDeepResearch: Long Video Understanding With Agentic Tool Using

Paper • 2506.10821 • Published Jun 12, 2025 • 19
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models

Paper • 2506.07177 • Published Jun 8, 2025 • 23
lym00/Wan2.2_T2V_A14B_VACE-test

17B • Updated Jul 29, 2025 • 828 • 42
Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation

Paper • 2509.18824 • Published Sep 23, 2025 • 23
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer

Paper • 2509.24695 • Published Sep 29, 2025 • 54
DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder

Paper • 2509.25182 • Published Sep 29, 2025 • 39
lovis93/next-scene-qwen-image-lora-2509

Image-to-Image • Updated Oct 21, 2025 • 19k • • 634
Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting

Paper • 2510.08696 • Published Oct 9, 2025 • 15
TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control

Paper • 2510.09561 • Published Oct 10, 2025 • 9
JunhaoZhuang/FlashVSR

Video-to-Video • Updated Apr 1 • 2.52k • 189
QingyanBai/Ditto_models

Video-to-Video • Updated 9 days ago • 77
meituan-longcat/LongCat-Video

Text-to-Video • Updated Oct 29, 2025 • 2.66k • • 529
krea/krea-realtime-video

Text-to-Video • Updated Nov 14, 2025 • 275 • 285
TencentARC/RollingForcing

Text-to-Video • Updated Oct 22, 2025 • 17
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

Paper • 2510.24821 • Published Oct 28, 2025 • 43
Scaling Latent Reasoning via Looped Language Models

Paper • 2510.25741 • Published Oct 29, 2025 • 233