Add pipeline tag, library name, and improve model card
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,34 +1,43 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
base_model: Qwen/Qwen2.5-Omni-7B
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
tags:
|
| 5 |
- lora
|
| 6 |
- qwen2.5-omni
|
| 7 |
- multimodal
|
| 8 |
- audio
|
| 9 |
-
datasets:
|
| 10 |
-
- ASU-GSL/AHA
|
| 11 |
---
|
| 12 |
|
| 13 |
-
#
|
|
|
|
|
|
|
| 14 |
|
| 15 |
## Model Description
|
| 16 |
-
|
| 17 |
|
| 18 |
-
|
|
|
|
|
|
|
| 19 |
|
| 20 |
## Intended Use
|
| 21 |
-
- **Primary Task:** Audio reasoning.
|
| 22 |
-
- **Languages Supported:** All languages supported by Qwen2.5-Omni-7B.
|
| 23 |
|
| 24 |
-
##
|
| 25 |
-
|
|
|
|
| 26 |
|
| 27 |
```python
|
|
|
|
|
|
|
| 28 |
from transformers import Qwen2_5OmniThinkerForConditionalGeneration, Qwen2_5OmniProcessor
|
| 29 |
from peft import PeftModel
|
| 30 |
-
import torch
|
| 31 |
|
|
|
|
| 32 |
model_id = "Qwen/Qwen2.5-Omni-7B"
|
| 33 |
adapter_id = "ASU-GSL/Qwen-Audio-AHA"
|
| 34 |
|
|
@@ -40,9 +49,21 @@ model = Qwen2_5OmniThinkerForConditionalGeneration.from_pretrained(
|
|
| 40 |
|
| 41 |
# Load LoRA adapter
|
| 42 |
model = PeftModel.from_pretrained(model, adapter_id)
|
| 43 |
-
```
|
| 44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
```
|
|
|
|
|
|
|
|
|
|
| 46 |
@article{chen2025aha,
|
| 47 |
title={AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives},
|
| 48 |
author={Chen, Yanxi and Zhu, Wenhui and Chen, Xiwen and Wang, Zhipeng and Li, Xin and Qiu, Peijie and Wang, Hao and Dong, Xuanzhao and Xiong, Yujian and Schneider, Anderson and others},
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
base_model: Qwen/Qwen2.5-Omni-7B
|
| 3 |
+
datasets:
|
| 4 |
+
- ASU-GSL/AHA
|
| 5 |
+
library_name: peft
|
| 6 |
+
license: apache-2.0
|
| 7 |
+
pipeline_tag: audio-text-to-text
|
| 8 |
tags:
|
| 9 |
- lora
|
| 10 |
- qwen2.5-omni
|
| 11 |
- multimodal
|
| 12 |
- audio
|
|
|
|
|
|
|
| 13 |
---
|
| 14 |
|
| 15 |
+
# Qwen-Audio-AHA (LoRA Adapter)
|
| 16 |
+
|
| 17 |
+
This repository contains the official LoRA adapter for **Qwen2.5-Omni-7B** (Thinker), fine-tuned using the **AHA (Audio Hallucination Alignment)** framework.
|
| 18 |
|
| 19 |
## Model Description
|
| 20 |
+
AHA is a framework designed to mitigate hallucinations in Large Audio-Language Models (LALMs) by focusing on fine-grained temporal reasoning and counterfactual alignment. By leveraging counterfactual hard negative mining, the pipeline constructs high-quality preference data that forces models to distinguish strict acoustic evidence from linguistically plausible fabrications.
|
| 21 |
|
| 22 |
+
- **Paper:** [AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives](https://huggingface.co/papers/2512.24052)
|
| 23 |
+
- **GitHub Repository:** [https://github.com/LLM-VLM-GSL/AHA](https://github.com/LLM-VLM-GSL/AHA)
|
| 24 |
+
- **Base Model:** [Qwen/Qwen2.5-Omni-7B](https://huggingface.co/Qwen/Qwen2.5-Omni-7B)
|
| 25 |
|
| 26 |
## Intended Use
|
| 27 |
+
- **Primary Task:** Audio reasoning and reducing hallucinations in audio-to-text tasks.
|
| 28 |
+
- **Languages Supported:** All languages supported by the base Qwen2.5-Omni-7B model.
|
| 29 |
|
| 30 |
+
## Sample Usage
|
| 31 |
+
|
| 32 |
+
You can load this model using the `peft` and `transformers` libraries. Note that `librosa` is required for audio loading in this example.
|
| 33 |
|
| 34 |
```python
|
| 35 |
+
import torch
|
| 36 |
+
import librosa
|
| 37 |
from transformers import Qwen2_5OmniThinkerForConditionalGeneration, Qwen2_5OmniProcessor
|
| 38 |
from peft import PeftModel
|
|
|
|
| 39 |
|
| 40 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 41 |
model_id = "Qwen/Qwen2.5-Omni-7B"
|
| 42 |
adapter_id = "ASU-GSL/Qwen-Audio-AHA"
|
| 43 |
|
|
|
|
| 49 |
|
| 50 |
# Load LoRA adapter
|
| 51 |
model = PeftModel.from_pretrained(model, adapter_id)
|
|
|
|
| 52 |
|
| 53 |
+
# Load Audio
|
| 54 |
+
# Replace "example.wav" with the path to your audio file
|
| 55 |
+
audio, _ = librosa.load("example.wav", sr=processor.feature_extractor.sampling_rate)
|
| 56 |
+
prompt = "<|audio|>
|
| 57 |
+
Describe the temporal order of events in this audio."
|
| 58 |
+
inputs = processor(text=prompt, audios=audio, return_tensors="pt").to(device)
|
| 59 |
+
|
| 60 |
+
# Generate
|
| 61 |
+
generate_ids = model.generate(**inputs, max_new_tokens=256)
|
| 62 |
+
print(processor.batch_decode(generate_ids, skip_special_tokens=True)[0])
|
| 63 |
```
|
| 64 |
+
|
| 65 |
+
## Citation
|
| 66 |
+
```bibtex
|
| 67 |
@article{chen2025aha,
|
| 68 |
title={AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives},
|
| 69 |
author={Chen, Yanxi and Zhu, Wenhui and Chen, Xiwen and Wang, Zhipeng and Li, Xin and Qiu, Peijie and Wang, Hao and Dong, Xuanzhao and Xiong, Yujian and Schneider, Anderson and others},
|