Instructions to use iiiiii123/AVBench_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use iiiiii123/AVBench_model with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("iiiiii123/AVBench_model", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| language: | |
| - en | |
| - zh | |
| library_name: transformers | |
| tags: | |
| - avbench | |
| - audio-text | |
| - video-text | |
| - audio-video | |
| base_model: | |
| - Qwen/Qwen2-Audio-7B-Instruct | |
| - Qwen/Qwen2.5-Omni-7B | |
| # AVBench Models | |
| This repository hosts the evaluator models used in **AVBench**, a benchmark for text-to-audio-video generation quality and cross-modal consistency. | |
| [](https://huggingface.co/datasets/iiiiii123/AVBench) | |
| [](https://huggingface.co/iiiiii123/AVBench_model) | |
| ## AVBench in brief | |
| AVBench evaluates generated content on two splits: | |
| - **Normal split**: common, easier samples. | |
| - **Hard split**: challenging samples with stronger cross-modal requirements. | |
| It covers cross-modal alignment (Audio-Text / Video-Text / Audio-Video) and generation quality dimensions. | |
| Dataset link: | |
| - https://huggingface.co/datasets/iiiiii123/AVBench | |
| ## Model zoo used by AVBench | |
| | Model | Use in AVBench | Trained / merged from | | |
| |---|---|---| | |
| | `Qwen2-Audio-7B-AudioTextMatching-Merged` | Audio-Text consistency scoring (AT) | `Qwen/Qwen2-Audio-7B-Instruct` | | |
| | `Qwen2.5-Omni-7B-VideoTextMatching-Merged` | Video-Text consistency scoring (VT) | `Qwen/Qwen2.5-Omni-7B` | | |
| | `Qwen2.5-Omni-7B-AudioVideoMatching-Merged` | Audio-Video consistency scoring (AV) | `Qwen/Qwen2.5-Omni-7B` | | |
| ## Notes | |
| These models are released for AVBench evaluation and analysis. | |