saraghznfri
/

CoF-models

Model card Files Files and versions

xet

Community

Add model card for Chain-of-Frames

by nielsr HF Staff - opened Apr 7

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+44

-0

Files changed (1) hide show

README.md +44 -0

README.md ADDED Viewed

	@@ -0,0 +1,44 @@

+---
+pipeline_tag: video-text-to-text
+library_name: transformers
+---
+# Chain-of-Frames
+Chain-of-Frames (CoF) is a framework to obtain video LLMs whose reasoning steps are grounded in, and explicitly refer to, relevant frames. It employs a single-stage reasoning approach with explicit references to frame IDs, which helps reduce temporal inconsistencies in the reasoning process without relying on auxiliary modules for frame selection or caption generation.
+- **Paper:** [Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning](https://huggingface.co/papers/2506.00318)
+- **Repository:** [https://github.com/SaraGhazanfari/CoF](https://github.com/SaraGhazanfari/CoF)
+## Sample Usage
+The model loading and evaluation procedures are similar to those used in the InternVL repository. You can load the model using the `transformers` library as follows:
+```python
+import torch
+from transformers import AutoTokenizer, AutoModel
+model_path = "path/to/CoF-model" # replace with specific checkpoint path or repo ID
+model = AutoModel.from_pretrained(
+    model_path,
+    torch_dtype=torch.bfloat16,
+    low_cpu_mem_usage=True,
+    use_flash_attn=True,
+    trust_remote_code=True).eval().cuda()
+tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True, use_fast=False)
+generation_config = dict(max_new_tokens=2048, do_sample=False)
+```
+## Citation
+If you use this work, please consider citing:
+```bibtex
+@article{ghazanfari2025chainofframes,
+      title={Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning},
+      author={Sara Ghazanfari and Francesco Croce and Nicolas Flammarion and Prashanth Krishnamurthy and Farshad Khorrami and Siddharth Garg},
+      year={2025},
+      journal={arXiv preprint arxiv:2506.00318}
+}
+```