@XiangpengYang on Hugging Face: "🚀 Introducing VideoCoF: Unified Video Editing with a Temporal Reasoner…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

posted an update Dec 13, 2025

Post

3063

🚀 Introducing VideoCoF: Unified Video Editing with a Temporal Reasoner (Chain-of-Frames)!

We’re excited to introduce VideoCoF, a unified framework for instruction-based video editing that enables temporal reasoning and ~4× video length extrapolation, trained with only 50k video pairs. 🔥

🔍 What makes VideoCoF different?
🧠 Chain-of-Frames reasoning , mimic human thinking process like Seeing → Reasoning → Editing to apply edits accurately over time without external masks, ensuring physically plausible results.
📈 Strong length generalization — trained on 33-frame clips, yet supports multi-shot editing and long-video extrapolation (~4×).
🎯 Unified fine-grained editing — Object Removal, Addition, Swap, and Local Style Transfer, with instance-level & part-level, spatial-aware control.

⚡ Fast inference update
🚀 H100: ~20s / video with 4-step inference, making high-quality video editing far more practical for real-world use.

🔗 Links
📄 Paper: https://arxiv.org/abs/2512.07469
💻 Code: https://github.com/knightyxp/VideoCoF
🤗 Demo: XiangpengYang/VideoCoF
🧩 Models: XiangpengYang/VideoCoF
🌐 Project Page: https://videocof.github.io/

#VideoEditing #DiffusionModels #GenerativeAI #ComputerVision #AI

attacker05

Dec 13, 2025

test

attacker05

Dec 13, 2025

test

AmosTipton

Dec 14, 2025

Really impressive work — the Chain-of-Frames framing is especially clean. Treating temporal consistency as a first-class constraint instead of a post-hoc fix feels like the right abstraction.

One thing this made me think about is how much trust in systems depends on refusing to “hallucinate success” early — whether that’s in video editing, verification, or evidence pipelines.

I’ve been working on a verification system where proof is intentionally withheld until the underlying storage is fully durable, specifically to avoid the kind of silent failure modes you’re guarding against here.

Different domain, same philosophy: correctness > convenience.

Thanks for sharing this — great read.

XiangpengYang

Dec 15, 2025

Thanks! I really love your phrasing of 'refusing to hallucinate success' — that's exactly the mindset we aimed for. Glad the philosophy resonates!

jameshuntercarter

Dec 15, 2025

Why are all the examples low-fidelity / low-frame rate?

XiangpengYang

Dec 15, 2025

Thanks for the feedback! I've just updated the input video, so the examples should now match the quality shown in the teaser. The current frame rate is set to 8 fps, which is standard for these demos.

Regarding the resolution (fidelity), VideoCoF actually supports arbitrary resolution and arbitrary length. You are welcome to upload your own high-resolution videos to test the performance!

In this post