| license: apache-2.0 | |
| pipeline_tag: video-text-to-text | |
| library_name: transformers | |
| This repository contains the `Videollama3Qwen2ForCausalLM` model, a reward model presented in the paper [Learning Human-Perceived Fakeness in AI-Generated Videos via Multimodal LLMs](https://huggingface.co/papers/2509.22646). | |
| The model is designed to detect human-perceived deepfake traces in AI-generated videos. It takes multimodal input and provides natural-language explanations, bounding-box regions for spatial grounding, and precise onset/offset timestamps for temporal labeling. It was trained on the DeeptraceReward benchmark, which is the first fine-grained, spatially- and temporally-aware dataset for annotating human-perceived fake traces. | |
| * **Paper**: [Learning Human-Perceived Fakeness in AI-Generated Videos via Multimodal LLMs](https://huggingface.co/papers/2509.22646) | |
| * **Project Page**: https://deeptracereward.github.io/ | |
| * **Code**: https://github.com/deeptracereward/deeptracereward |