Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
sanaka87Β 
posted an update 13 days ago
Post
3493
πŸš€ Introducing VideoCoF: Unified Video Editing with a Temporal Reasoner (Chain-of-Frames)!

We’re excited to introduce VideoCoF, a unified framework for instruction-based video editing that enables temporal reasoning and ~4Γ— video length extrapolation, trained with only 50k video pairs. πŸ”₯

πŸ” What makes VideoCoF different?
🧠 Chain-of-Frames reasoning , mimic human thinking process like Seeing β†’ Reasoning β†’ Editing to apply edits accurately over time without external masks, ensuring physically plausible results.
πŸ“ˆ Strong length generalization β€” trained on 33-frame clips, yet supports multi-shot editing and long-video extrapolation (~4Γ—).
🎯 Unified fine-grained editing β€” Object Removal, Addition, Swap, and Local Style Transfer, with instance-level & part-level, spatial-aware control.

⚑ Fast inference update
πŸš€ H100: ~20s / video with 4-step inference, making high-quality video editing far more practical for real-world use.

πŸ”— Links
πŸ“„ Paper: https://arxiv.org/abs/2512.07469
πŸ’» Code: https://github.com/knightyxp/VideoCoF
πŸ€— Demo: XiangpengYang/VideoCoF
🧩 Models: XiangpengYang/VideoCoF
🌐 Project Page: https://videocof.github.io/

#VideoEditing #DiffusionModels #GenerativeAI #ComputerVision #AI

Appreciate the restraint here β€” especially avoiding hidden correction mechanisms. In my experience, temporal systems become trustworthy only when they’re allowed to visibly fail rather than silently compensate.

Looking forward to seeing how this behaves under adversarial or edge-case temporal edits.

This is really impressive work β€” the Chain-of-Frames approach sounds like a big step forward for keeping edits consistent over time, especially without relying on masks. The length generalization is huge too; long-video editing is where a lot of tools still struggle. It actually reminds me of how platforms like Avanquest https://avanquest.pissedconsumer.com/review.html have been pushing toward more unified, user-friendly creative tools, but this feels much more research-driven and next-level. Excited to see how VideoCoF evolves and gets adopted.