IHP-Lab
/

Qwen2-Audio_PCLM_DPO

Audio-Text-to-Text

text2text-generation

Model card Files Files and versions

JiachengPang commited on 1 day ago

Commit

b63adb3

·

verified ·

1 Parent(s): fc5ae18

License: link to in-repo file

Files changed (1) hide show

README.md +20 -3

README.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 license: other
 license_name: usc-research
-license_link: https://github.com/ihp-lab/VoxParadox/blob/main/LICENSE
 language:
 - en
 library_name: transformers
@@ -19,6 +19,14 @@ pipeline_tag: audio-text-to-text
 # Qwen2-Audio + PCLM + DPO
 PCLM- and DPO-finetuned [Qwen2-Audio-7B-Instruct](https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct) from
 *Do Audio LLMs Listen or Read? Analyzing and Mitigating Paralinguistic Failures with VoxParadox*
 (ICML 2026).
@@ -59,6 +67,16 @@ python eval.py --predictions runs/eval/qwen2audio_pclm_dpo/predictions.jsonl
 The loader auto-detects `use_pclm=True` from `config.json` and activates PCLM with
 `expose_layers=[5, 15, 25, 30]` over the audio encoder.
 ## Citation
 ```bibtex
@@ -72,8 +90,7 @@ The loader auto-detects `use_pclm=True` from `config.json` and activates PCLM wi
 ## License
-USC Research License (research / non-profit only). See the
-[license file](https://github.com/ihp-lab/VoxParadox/blob/main/LICENSE).
 The base model (`Qwen/Qwen2-Audio-7B-Instruct`) carries its own Tongyi Qianwen license terms,
 which continue to apply to the inherited weights.

 ---
 license: other
 license_name: usc-research
+license_link: LICENSE
 language:
 - en
 library_name: transformers
 # Qwen2-Audio + PCLM + DPO
+[![ICML 2026](https://img.shields.io/badge/ICML-2026-1d4ed8.svg)](https://icml.cc/Conferences/2026)
+[![Paper](https://img.shields.io/badge/Paper-arXiv-AD1C18.svg)](https://arxiv.org/abs/2605.27772)
+[![Project Page](https://img.shields.io/badge/Project-Page-0EA5E9.svg)](https://voxparadox.github.io/)
+[![Code](https://img.shields.io/badge/GitHub-ihp--lab%2FVoxParadox-181717.svg?logo=github)](https://github.com/ihp-lab/VoxParadox)
+[![Dataset](https://img.shields.io/badge/🤗%20Dataset-IHP--Lab%2FVoxParadox-FFD21E.svg)](https://huggingface.co/datasets/IHP-Lab/VoxParadox)
+[![AF3 + PCLM + DPO](https://img.shields.io/badge/🤗%20Sibling%20model-AF3+PCLM+DPO-FFD21E.svg)](https://huggingface.co/IHP-Lab/AF3_PCLM_DPO)
+[![License](https://img.shields.io/badge/License-USC%20Research-228B22.svg)](LICENSE)
 PCLM- and DPO-finetuned [Qwen2-Audio-7B-Instruct](https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct) from
 *Do Audio LLMs Listen or Read? Analyzing and Mitigating Paralinguistic Failures with VoxParadox*
 (ICML 2026).
 The loader auto-detects `use_pclm=True` from `config.json` and activates PCLM with
 `expose_layers=[5, 15, 25, 30]` over the audio encoder.
+## Project resources
+| Resource | Link |
+|---|---|
+| Paper (arXiv) | <https://arxiv.org/abs/2605.27772> |
+| Project page | <https://voxparadox.github.io/> |
+| Code | <https://github.com/ihp-lab/VoxParadox> |
+| Benchmark | <https://huggingface.co/datasets/IHP-Lab/VoxParadox> |
+| Sibling model (AF3) | <https://huggingface.co/IHP-Lab/AF3_PCLM_DPO> |
 ## Citation
 ```bibtex
 ## License
+USC Research License (research / non-profit only). See [`LICENSE`](LICENSE).
 The base model (`Qwen/Qwen2-Audio-7B-Instruct`) carries its own Tongyi Qianwen license terms,
 which continue to apply to the inherited weights.