| --- |
| base_model: |
| - lmsys/vicuna-7b-v1.1 |
| datasets: |
| - MovieCORE/MovieCORE |
| - Enxin/MovieChat-1K-test |
| license: mit |
| pipeline_tag: video-text-to-text |
| --- |
| |
| <div align="center"> |
| <img src="https://github.com/joslefaure/MovieCORE/raw/main/assets/moviecore_icon.png" alt="MovieCORE Icon" width="150"/> |
| |
| # MovieCORE: COgnitive REasoning in Movies |
| |
| **A Video Question Answering Dataset for Probing Deeper Cognitive Understanding of Movie Content** |
| |
| [](https://arxiv.org/abs/2508.19026) |
| [](https://huggingface.co/papers/2508.19026) |
| [](https://huggingface.co/datasets/MovieCORE/MovieCORE) |
| [](https://github.com/joslefaure/moviecore) |
| [](https://joslefaure.github.io/assets/html/moviecore.html) |
| [](https://github.com/joslefaure/MovieCORE/blob/main/LICENSE) |
| |
|  |
| </div> |
|
|
| ## π Overview |
|
|
| MovieCORE is a comprehensive video question answering (VQA) dataset specifically designed to evaluate and probe deeper cognitive understanding of movie content. Unlike traditional VQA datasets that focus on surface-level visual understanding, MovieCORE challenges models to demonstrate sophisticated reasoning about narrative structures, character development, thematic elements, and complex temporal relationships within cinematic content. |
|
|
| ## ποΈ Data Preparation |
|
|
| The MovieCORE dataset builds upon video content from MovieChat. To get started: |
|
|
| ### Video Data |
| Download the video files from MovieChat's HuggingFace repositories: |
| - **Training Data**: [MovieChat-1K Train](https://huggingface.co/datasets/Enxin/MovieChat-1K_train) |
| - **Test Data**: [MovieChat-1K Test](https://huggingface.co/datasets/Enxin/MovieChat-1K-test) |
|
|
| ### Annotations |
| Access our annotations on HuggingFace: |
| - **MovieCORE Annotations**: [π€ HuggingFace Dataset](https://huggingface.co/datasets/MovieCORE/MovieCORE/tree/main) |
|
|
| Extract and organize the data according to your model's requirements, then use our annotations for evaluation. |
|
|
| ## π Quick Start |
|
|
| ### Installation |
| ```bash |
| git clone https://github.com/joslefaure/MovieCORE.git |
| cd MovieCORE |
| ``` |
|
|
| ## π― Baselines |
| - We have provided the script to run [HERMES](https://github.com/joslefaure/HERMES) (ICCV'25) on MovieCORE. Please check out the linked project. |
|
|
| ## π Evaluation Dimensions |
|
|
| MovieCORE employs a comprehensive multi-dimensional evaluation framework to assess model performance across different aspects of cognitive understanding: |
|
|
| | Dimension | Description | |
| |-----------|-------------| |
| | **π― Accuracy** | Measures semantic similarity between predicted and ground truth answers | |
| | **π Comprehensiveness** | Assesses coverage of all key aspects mentioned in the ground truth | |
| | **π§ Depth** | Evaluates level of reasoning and insight demonstrated in predictions | |
| | **π Evidence** | Checks quality and relevance of supporting evidence provided | |
| | **π Coherence** | Measures logical flow, organization, and clarity of responses | |
|
|
| Each dimension provides unique insights into different cognitive capabilities required for deep video understanding. |
|
|
| ## π» Usage |
|
|
| ### Evaluation Script |
|
|
| Evaluate your model's performance on MovieCORE using our evaluation script: |
|
|
| ```bash |
| export OPENAI_API_KEY='your_openai_api_key' |
| python evaluate_moviecore.py --pred_path path/to/your/predictions.json |
| ``` |
|
|
| ### π Input Format |
|
|
| Your predictions should follow this JSON structure: |
|
|
| ```json |
| { |
| "video_1.mp4": [ |
| { |
| "question": "How does the video depict the unique adaptations of the species in the Sahara Desert, and what roles do these species play in their ecosystem?", |
| "answer": "The ground truth answer.", |
| "pred": "Your model's prediction.", |
| "classification": "the question classification" |
| }, |
| { |
| "question": "The second question for video 1?", |
| "answer": "The ground truth answer.", |
| "pred": "Your model's prediction.", |
| "classification": "the question classification" |
| } |
| ], |
| "video_2.mp4": [ |
| { |
| "question": "The only question for video 2", |
| "answer": "The ground truth answer.", |
| "pred": "Your model's prediction.", |
| "classification": "the question classification" |
| } |
| ] |
| } |
| ``` |
|
|
| ### π Output |
|
|
| The evaluation script provides: |
| - Overall scores across all dimensions |
| - Classification-specific performance metrics |
| - Detailed breakdowns for comprehensive analysis |
|
|
| ## π Citation |
|
|
| If you use MovieCORE in your research, please cite our paper: |
|
|
| ```bibtex |
| @misc{faure2025moviecorecognitivereasoningmovies, |
| title={MovieCORE: COgnitive REasoning in Movies}, |
| author={Gueter Josmy Faure and Min-Hung Chen and Jia-Fong Yeh and Ying Cheng and Hung-Ting Su and Yung-Hao Tang and Shang-Hong Lai and Winston H. Hsu}, |
| year={2025}, |
| eprint={2508.19026}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CL}, |
| url={https://arxiv.org/abs/2508.19026}, |
| } |
| ``` |
|
|
| ## π€ Contributing |
|
|
| We welcome contributions to MovieCORE! Please feel free to: |
| - Report issues or bugs |
| - Suggest improvements or new features |
| - Submit baseline implementations |
| - Provide feedback on the evaluation framework |
|
|
| ## π License |
|
|
| This dataset is provided under the MIT License. See [LICENSE](https://github.com/joslefaure/MovieCORE/blob/main/LICENSE) for more details. |
|
|
| --- |
|
|
| <div align="center"> |
| <p>π¬ <strong>Advancing Video Understanding Through Cognitive Evaluation</strong> π¬</p> |
| |
| **[\ud83d\udcd6 Paper](https://arxiv.org/abs/2508.19026v1) | [\ud83e\udd17 Dataset](https://huggingface.co/datasets/MovieCORE/MovieCORE) | [\ud83d\udcbb Code](https://github.com/joslefaure/moviecore)** |
| </div> |