ASU-GSL
/

Qwen-Audio-AHA

@@ -1,34 +1,43 @@
 ---
-license: apache-2.0
 base_model: Qwen/Qwen2.5-Omni-7B
 tags:
 - lora
 - qwen2.5-omni
 - multimodal
 - audio
-datasets:
-- ASU-GSL/AHA
 ---
-# Qwen2.5-Omni LoRA Adapter
 ## Model Description
-This is a LoRA adapter for **Qwen2.5-Omni-7B** **Thinker**, fine-tuned to reduce audio hallucination.
-Qwen2.5-Omni is a foundational multimodal model capable of seamless audio-to-audio and audio-to-text interactions. This adapter enhances the model's audio reasoning capability by reducing model hallucination.
 ## Intended Use
-- **Primary Task:** Audio reasoning.
-- **Languages Supported:** All languages supported by Qwen2.5-Omni-7B.
-## How to Load
-You can load this model using the `peft` and `transformers` libraries:
 ```python
 from transformers import Qwen2_5OmniThinkerForConditionalGeneration, Qwen2_5OmniProcessor
 from peft import PeftModel
-import torch
 model_id = "Qwen/Qwen2.5-Omni-7B"
 adapter_id = "ASU-GSL/Qwen-Audio-AHA"
@@ -40,9 +49,21 @@ model = Qwen2_5OmniThinkerForConditionalGeneration.from_pretrained(
 # Load LoRA adapter
 model = PeftModel.from_pretrained(model, adapter_id)
-```
 ```
 @article{chen2025aha,
   title={AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives},
   author={Chen, Yanxi and Zhu, Wenhui and Chen, Xiwen and Wang, Zhipeng and Li, Xin and Qiu, Peijie and Wang, Hao and Dong, Xuanzhao and Xiong, Yujian and Schneider, Anderson and others},

 ---
 base_model: Qwen/Qwen2.5-Omni-7B
+datasets:
+- ASU-GSL/AHA
+library_name: peft
+license: apache-2.0
+pipeline_tag: audio-text-to-text
 tags:
 - lora
 - qwen2.5-omni
 - multimodal
 - audio
 ---
+# Qwen-Audio-AHA (LoRA Adapter)
+This repository contains the official LoRA adapter for **Qwen2.5-Omni-7B** (Thinker), fine-tuned using the **AHA (Audio Hallucination Alignment)** framework.
 ## Model Description
+AHA is a framework designed to mitigate hallucinations in Large Audio-Language Models (LALMs) by focusing on fine-grained temporal reasoning and counterfactual alignment. By leveraging counterfactual hard negative mining, the pipeline constructs high-quality preference data that forces models to distinguish strict acoustic evidence from linguistically plausible fabrications.
+- **Paper:** [AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives](https://huggingface.co/papers/2512.24052)
+- **GitHub Repository:** [https://github.com/LLM-VLM-GSL/AHA](https://github.com/LLM-VLM-GSL/AHA)
+- **Base Model:** [Qwen/Qwen2.5-Omni-7B](https://huggingface.co/Qwen/Qwen2.5-Omni-7B)
 ## Intended Use
+- **Primary Task:** Audio reasoning and reducing hallucinations in audio-to-text tasks.
+- **Languages Supported:** All languages supported by the base Qwen2.5-Omni-7B model.
+## Sample Usage
+You can load this model using the `peft` and `transformers` libraries. Note that `librosa` is required for audio loading in this example.
 ```python
+import torch
+import librosa
 from transformers import Qwen2_5OmniThinkerForConditionalGeneration, Qwen2_5OmniProcessor
 from peft import PeftModel
+device = "cuda" if torch.cuda.is_available() else "cpu"
 model_id = "Qwen/Qwen2.5-Omni-7B"
 adapter_id = "ASU-GSL/Qwen-Audio-AHA"
 # Load LoRA adapter
 model = PeftModel.from_pretrained(model, adapter_id)
+# Load Audio
+# Replace "example.wav" with the path to your audio file
+audio, _ = librosa.load("example.wav", sr=processor.feature_extractor.sampling_rate)
+prompt = "<|audio|>
+Describe the temporal order of events in this audio."
+inputs = processor(text=prompt, audios=audio, return_tensors="pt").to(device)
+# Generate
+generate_ids = model.generate(**inputs, max_new_tokens=256)
+print(processor.batch_decode(generate_ids, skip_special_tokens=True)[0])
 ```
+## Citation
+```bibtex
 @article{chen2025aha,
   title={AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives},
   author={Chen, Yanxi and Zhu, Wenhui and Chen, Xiwen and Wang, Zhipeng and Li, Xin and Qiu, Peijie and Wang, Hao and Dong, Xuanzhao and Xiong, Yujian and Schneider, Anderson and others},