Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
XiangpengYang 
posted an update 13 days ago
Post
2977
🚀 Introducing VideoCoF: Unified Video Editing with a Temporal Reasoner (Chain-of-Frames)!

We’re excited to introduce VideoCoF, a unified framework for instruction-based video editing that enables temporal reasoning and ~4× video length extrapolation, trained with only 50k video pairs. 🔥

🔍 What makes VideoCoF different?
🧠 Chain-of-Frames reasoning , mimic human thinking process like Seeing → Reasoning → Editing to apply edits accurately over time without external masks, ensuring physically plausible results.
📈 Strong length generalization — trained on 33-frame clips, yet supports multi-shot editing and long-video extrapolation (~4×).
🎯 Unified fine-grained editing — Object Removal, Addition, Swap, and Local Style Transfer, with instance-level & part-level, spatial-aware control.

⚡ Fast inference update
🚀 H100: ~20s / video with 4-step inference, making high-quality video editing far more practical for real-world use.

🔗 Links
📄 Paper: https://arxiv.org/abs/2512.07469
💻 Code: https://github.com/knightyxp/VideoCoF
🤗 Demo: XiangpengYang/VideoCoF
🧩 Models: XiangpengYang/VideoCoF
🌐 Project Page: https://videocof.github.io/

#VideoEditing #DiffusionModels #GenerativeAI #ComputerVision #AI

test

·

Really impressive work — the Chain-of-Frames framing is especially clean. Treating temporal consistency as a first-class constraint instead of a post-hoc fix feels like the right abstraction.

One thing this made me think about is how much trust in systems depends on refusing to “hallucinate success” early — whether that’s in video editing, verification, or evidence pipelines.

I’ve been working on a verification system where proof is intentionally withheld until the underlying storage is fully durable, specifically to avoid the kind of silent failure modes you’re guarding against here.

Different domain, same philosophy: correctness > convenience.

Thanks for sharing this — great read.

·

Thanks! I really love your phrasing of 'refusing to hallucinate success' — that's exactly the mindset we aimed for. Glad the philosophy resonates!

Why are all the examples low-fidelity / low-frame rate?

·

Thanks for the feedback! I've just updated the input video, so the examples should now match the quality shown in the teaser. The current frame rate is set to 8 fps, which is standard for these demos.

Regarding the resolution (fidelity), VideoCoF actually supports arbitrary resolution and arbitrary length. You are welcome to upload your own high-resolution videos to test the performance!