Audio-Text-to-Text
Transformers
Safetensors
English
qwen2_audio
text2text-generation
audio
speech
audio-llm
paralinguistic
pclm
dpo
voxparadox
Instructions to use IHP-Lab/Qwen2-Audio_PCLM_DPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use IHP-Lab/Qwen2-Audio_PCLM_DPO with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForSeq2SeqLM processor = AutoProcessor.from_pretrained("IHP-Lab/Qwen2-Audio_PCLM_DPO") model = AutoModelForSeq2SeqLM.from_pretrained("IHP-Lab/Qwen2-Audio_PCLM_DPO") - Notebooks
- Google Colab
- Kaggle
License: link to in-repo file
Browse files
README.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
---
|
| 2 |
license: other
|
| 3 |
license_name: usc-research
|
| 4 |
-
license_link:
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
library_name: transformers
|
|
@@ -19,6 +19,14 @@ pipeline_tag: audio-text-to-text
|
|
| 19 |
|
| 20 |
# Qwen2-Audio + PCLM + DPO
|
| 21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
PCLM- and DPO-finetuned [Qwen2-Audio-7B-Instruct](https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct) from
|
| 23 |
*Do Audio LLMs Listen or Read? Analyzing and Mitigating Paralinguistic Failures with VoxParadox*
|
| 24 |
(ICML 2026).
|
|
@@ -59,6 +67,16 @@ python eval.py --predictions runs/eval/qwen2audio_pclm_dpo/predictions.jsonl
|
|
| 59 |
The loader auto-detects `use_pclm=True` from `config.json` and activates PCLM with
|
| 60 |
`expose_layers=[5, 15, 25, 30]` over the audio encoder.
|
| 61 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
## Citation
|
| 63 |
|
| 64 |
```bibtex
|
|
@@ -72,8 +90,7 @@ The loader auto-detects `use_pclm=True` from `config.json` and activates PCLM wi
|
|
| 72 |
|
| 73 |
## License
|
| 74 |
|
| 75 |
-
USC Research License (research / non-profit only). See
|
| 76 |
-
[license file](https://github.com/ihp-lab/VoxParadox/blob/main/LICENSE).
|
| 77 |
|
| 78 |
The base model (`Qwen/Qwen2-Audio-7B-Instruct`) carries its own Tongyi Qianwen license terms,
|
| 79 |
which continue to apply to the inherited weights.
|
|
|
|
| 1 |
---
|
| 2 |
license: other
|
| 3 |
license_name: usc-research
|
| 4 |
+
license_link: LICENSE
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
library_name: transformers
|
|
|
|
| 19 |
|
| 20 |
# Qwen2-Audio + PCLM + DPO
|
| 21 |
|
| 22 |
+
[](https://icml.cc/Conferences/2026)
|
| 23 |
+
[](https://arxiv.org/abs/2605.27772)
|
| 24 |
+
[](https://voxparadox.github.io/)
|
| 25 |
+
[](https://github.com/ihp-lab/VoxParadox)
|
| 26 |
+
[](https://huggingface.co/datasets/IHP-Lab/VoxParadox)
|
| 27 |
+
[](https://huggingface.co/IHP-Lab/AF3_PCLM_DPO)
|
| 28 |
+
[](LICENSE)
|
| 29 |
+
|
| 30 |
PCLM- and DPO-finetuned [Qwen2-Audio-7B-Instruct](https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct) from
|
| 31 |
*Do Audio LLMs Listen or Read? Analyzing and Mitigating Paralinguistic Failures with VoxParadox*
|
| 32 |
(ICML 2026).
|
|
|
|
| 67 |
The loader auto-detects `use_pclm=True` from `config.json` and activates PCLM with
|
| 68 |
`expose_layers=[5, 15, 25, 30]` over the audio encoder.
|
| 69 |
|
| 70 |
+
## Project resources
|
| 71 |
+
|
| 72 |
+
| Resource | Link |
|
| 73 |
+
|---|---|
|
| 74 |
+
| Paper (arXiv) | <https://arxiv.org/abs/2605.27772> |
|
| 75 |
+
| Project page | <https://voxparadox.github.io/> |
|
| 76 |
+
| Code | <https://github.com/ihp-lab/VoxParadox> |
|
| 77 |
+
| Benchmark | <https://huggingface.co/datasets/IHP-Lab/VoxParadox> |
|
| 78 |
+
| Sibling model (AF3) | <https://huggingface.co/IHP-Lab/AF3_PCLM_DPO> |
|
| 79 |
+
|
| 80 |
## Citation
|
| 81 |
|
| 82 |
```bibtex
|
|
|
|
| 90 |
|
| 91 |
## License
|
| 92 |
|
| 93 |
+
USC Research License (research / non-profit only). See [`LICENSE`](LICENSE).
|
|
|
|
| 94 |
|
| 95 |
The base model (`Qwen/Qwen2-Audio-7B-Instruct`) carries its own Tongyi Qianwen license terms,
|
| 96 |
which continue to apply to the inherited weights.
|