@sanaka87 on Hugging Face: "🚀 Introducing VideoCoF: Unified Video Editing with a Temporal Reasoner…"

posted an update Dec 13, 2025

Post

3623

🚀 Introducing VideoCoF: Unified Video Editing with a Temporal Reasoner (Chain-of-Frames)!

We’re excited to introduce VideoCoF, a unified framework for instruction-based video editing that enables temporal reasoning and ~4× video length extrapolation, trained with only 50k video pairs. 🔥

🔍 What makes VideoCoF different?
🧠 Chain-of-Frames reasoning , mimic human thinking process like Seeing → Reasoning → Editing to apply edits accurately over time without external masks, ensuring physically plausible results.
📈 Strong length generalization — trained on 33-frame clips, yet supports multi-shot editing and long-video extrapolation (~4×).
🎯 Unified fine-grained editing — Object Removal, Addition, Swap, and Local Style Transfer, with instance-level & part-level, spatial-aware control.

⚡ Fast inference update
🚀 H100: ~20s / video with 4-step inference, making high-quality video editing far more practical for real-world use.

🔗 Links
📄 Paper: https://arxiv.org/abs/2512.07469
💻 Code: https://github.com/knightyxp/VideoCoF
🤗 Demo: XiangpengYang/VideoCoF
🧩 Models: XiangpengYang/VideoCoF
🌐 Project Page: https://videocof.github.io/

#VideoEditing #DiffusionModels #GenerativeAI #ComputerVision #AI

AmosTipton

Dec 14, 2025

Appreciate the restraint here — especially avoiding hidden correction mechanisms. In my experience, temporal systems become trustworthy only when they’re allowed to visibly fail rather than silently compensate.

Looking forward to seeing how this behaves under adversarial or edge-case temporal edits.

karenny

Dec 19, 2025

•

edited Dec 21, 2025

This is really impressive work — the Chain-of-Frames approach sounds like a big step forward for keeping edits consistent over time, especially without relying on masks. The length generalization is huge too; long-video editing is where a lot of tools still struggle. It actually reminds me of how platforms like Avanquest https://avanquest.pissedconsumer.com/review.html have been pushing toward more unified, user-friendly creative tools, but this feels much more research-driven and next-level. Excited to see how VideoCoF evolves and gets adopted.

Join the conversation