|
|
--- |
|
|
license: mit |
|
|
pipeline_tag: video-text-to-text |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models |
|
|
|
|
|
This repository contains the PhyDetEx model, designed for detecting and explaining physically implausible content in videos generated by Text-to-Video (T2V) models. PhyDetEx introduces a lightweight fine-tuning approach, enabling Vision-Language Models (VLMs) to not only detect physically implausible events but also generate textual explanations on the violated physical principles. |
|
|
|
|
|
This work was presented in the paper: |
|
|
[PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models](https://huggingface.co/papers/2512.01843) |
|
|
|
|
|
- π **Paper**: [PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models](https://huggingface.co/papers/2512.01843) |
|
|
- π» **Code**: [https://github.com/Zeqing-Wang/PhyDetEx](https://github.com/Zeqing-Wang/PhyDetEx) |
|
|
- π€ **PID Dataset**: [https://huggingface.co/datasets/NNaptmn/PhyDetExDatasets](https://huggingface.co/datasets/NNaptmn/PhyDetExDatasets) |
|
|
|
|
|
<img src="https://github.com/Zeqing-Wang/PhyDetEx/raw/main/assets/overall_figs.png" width="100%" alt="Overall Figure" /> |
|
|
|
|
|
## π₯ News |
|
|
- **[2025.12.01]** π₯ We release the PID Dataset and the PhyDetEx Model! |
|
|
|
|
|
## Introduction |
|
|
|
|
|
PhyDetEx is a model designed for detecting physical implausible content. Additionally, to better address and test physical implausible content detection, we provide the PID Physical Implausibility Detection dataset. |
|
|
|
|
|
## π§ How to Start |
|
|
|
|
|
### Download the PID Test split |
|
|
|
|
|
Download `PID_Test_split.zip` from [π€ PID Dataset](https://huggingface.co/datasets/NNaptmn/PhyDetExDatasets), place it in the `Data/PID_test` directory, and organize it as follows: |
|
|
PID_test/ |
|
|
pos/ |
|
|
video_xxx.mp4 |
|
|
...... |
|
|
neg/ |
|
|
video_xxx.mp4 |
|
|
...... |
|
|
anno_file.json |
|
|
``` |
|
|
|
|
|
### Download the PhyDetEx |
|
|
|
|
|
Download PhyDetEx from [π€ PhyDetEx Model](https://huggingface.co/NNaptmn/PhyDetEx). |
|
|
|
|
|
### Prepare the Environment |
|
|
|
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
Please note that the version of transformers may affect specific metrics, so it is recommended to use the version specified in requirements.txt. |
|
|
|
|
|
### Set variables |
|
|
In benchmark_on_pid_test_split.py, set the corresponding path for PhyDetEx, then run: |
|
|
``` |
|
|
python benchmark_on_pid_test_split.py |
|
|
``` |
|
|
The resulting ./res/res_on_pid_test.json will contain the F1 Score, Acc Plausible, and Acc Implausible. |
|
|
|
|
|
### Get the reasoning score |
|
|
Deploy any LLM using [lmdeploy](https://github.com/InternLM/lmdeploy). In the paper, we report results using LLaMa3 8B. |
|
|
|
|
|
In infer_llm_score_for_pid_test_lmdeploy.py, set the corresponding port and evaluation file path, then run: |
|
|
|
|
|
``` |
|
|
python infer_llm_score_for_pid_test_lmdeploy.py |
|
|
``` |
|
|
|
|
|
### π§ͺ Test on ImpossibleVideos |
|
|
|
|
|
You can download and process the Physical Law-related data from [Impossible-Videos](https://github.com/showlab/Impossible-Videos). Alternatively, we recommend directly downloading our preprocessed data: [π€ PID Dataset](https://huggingface.co/datasets/NNaptmn/PhyDetExDatasets) "ImpossibleVideos_Physical_Law_Only.zip", and placing it in `Data/PID_test`. The remaining steps are the same as for the PID Test. |
|
|
|
|
|
Please note that the scripts for running ImpossibleVideos are `benchmark_on_impossible_videos.py` and `infer_llm_score_for_impossible_video_lmdeploy.py`. |
|
|
|
|
|
## π§ Train the PhyDetEx |
|
|
|
|
|
In the [π€ PID Dataset](https://huggingface.co/datasets/NNaptmn/PhyDetExDatasets), we also provide the PID Train Split. For training PhyDetEx, we use [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory). |
|
|
|
|
|
## Acknowledgement |
|
|
We heavily borrow the data and code from ImpossibleVideos, and LLaMA-Factory. Thanks for sharing their code. |
|
|
|
|
|
## π Citation |
|
|
|
|
|
If you find the code useful for your work, please star this repo and consider citing: |
|
|
|
|
|
```bibtex |
|
|
@article{wang2025phydetex, |
|
|
title={PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models}, |
|
|
author={}, |
|
|
journal={arXiv preprint arXiv:2512.01843}, |
|
|
year={2025} |
|
|
} |
|
|
``` |