License file
Browse files
README.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
---
|
| 2 |
license: other
|
| 3 |
license_name: usc-research
|
| 4 |
-
license_link:
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
base_model: nvidia/audio-flamingo-3
|
|
@@ -18,6 +18,14 @@ pipeline_tag: audio-text-to-text
|
|
| 18 |
|
| 19 |
# Audio Flamingo 3 + PCLM + DPO
|
| 20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
PCLM- and DPO-finetuned [Audio Flamingo 3](https://huggingface.co/nvidia/audio-flamingo-3) from
|
| 22 |
*Do Audio LLMs Listen or Read? Analyzing and Mitigating Paralinguistic Failures with VoxParadox*
|
| 23 |
(ICML 2026).
|
|
@@ -73,6 +81,16 @@ python eval.py --predictions runs/eval/af3_pclm_dpo/predictions.jsonl
|
|
| 73 |
PCLM activation is read from this checkpoint's `config.json`
|
| 74 |
(`expose_layers=[5, 15, 25, 30]`, `use_sound_pclm=true`).
|
| 75 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
## Citation
|
| 77 |
|
| 78 |
```bibtex
|
|
@@ -86,8 +104,7 @@ PCLM activation is read from this checkpoint's `config.json`
|
|
| 86 |
|
| 87 |
## License
|
| 88 |
|
| 89 |
-
USC Research License (research / non-profit only). See
|
| 90 |
-
[license file](https://github.com/ihp-lab/VoxParadox/blob/main/LICENSE).
|
| 91 |
|
| 92 |
The base model (`nvidia/audio-flamingo-3`) carries the NVIDIA non-commercial license
|
| 93 |
terms, which continue to apply to the inherited weights.
|
|
|
|
| 1 |
---
|
| 2 |
license: other
|
| 3 |
license_name: usc-research
|
| 4 |
+
license_link: LICENSE
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
base_model: nvidia/audio-flamingo-3
|
|
|
|
| 18 |
|
| 19 |
# Audio Flamingo 3 + PCLM + DPO
|
| 20 |
|
| 21 |
+
[](https://icml.cc/Conferences/2026)
|
| 22 |
+
[](https://arxiv.org/abs/2605.27772)
|
| 23 |
+
[](https://voxparadox.github.io/)
|
| 24 |
+
[](https://github.com/ihp-lab/VoxParadox)
|
| 25 |
+
[](https://huggingface.co/datasets/IHP-Lab/VoxParadox)
|
| 26 |
+
[](https://huggingface.co/IHP-Lab/Qwen2-Audio_PCLM_DPO)
|
| 27 |
+
[](LICENSE)
|
| 28 |
+
|
| 29 |
PCLM- and DPO-finetuned [Audio Flamingo 3](https://huggingface.co/nvidia/audio-flamingo-3) from
|
| 30 |
*Do Audio LLMs Listen or Read? Analyzing and Mitigating Paralinguistic Failures with VoxParadox*
|
| 31 |
(ICML 2026).
|
|
|
|
| 81 |
PCLM activation is read from this checkpoint's `config.json`
|
| 82 |
(`expose_layers=[5, 15, 25, 30]`, `use_sound_pclm=true`).
|
| 83 |
|
| 84 |
+
## Project resources
|
| 85 |
+
|
| 86 |
+
| Resource | Link |
|
| 87 |
+
|---|---|
|
| 88 |
+
| Paper (arXiv) | <https://arxiv.org/abs/2605.27772> |
|
| 89 |
+
| Project page | <https://voxparadox.github.io/> |
|
| 90 |
+
| Code | <https://github.com/ihp-lab/VoxParadox> |
|
| 91 |
+
| Benchmark | <https://huggingface.co/datasets/IHP-Lab/VoxParadox> |
|
| 92 |
+
| Sibling model (Qwen2-Audio) | <https://huggingface.co/IHP-Lab/Qwen2-Audio_PCLM_DPO> |
|
| 93 |
+
|
| 94 |
## Citation
|
| 95 |
|
| 96 |
```bibtex
|
|
|
|
| 104 |
|
| 105 |
## License
|
| 106 |
|
| 107 |
+
USC Research License (research / non-profit only). See [`LICENSE`](LICENSE).
|
|
|
|
| 108 |
|
| 109 |
The base model (`nvidia/audio-flamingo-3`) carries the NVIDIA non-commercial license
|
| 110 |
terms, which continue to apply to the inherited weights.
|