---
base_model: Qwen/Qwen2.5-Omni-7B
datasets:
- ASU-GSL/AHA
library_name: peft
license: apache-2.0
pipeline_tag: audio-text-to-text
tags:
- lora
- qwen2.5-omni
- multimodal
- audio
---

# Qwen-Audio-AHA (LoRA Adapter)

This repository contains the official LoRA adapter for **Qwen2.5-Omni-7B** (Thinker), fine-tuned using the **AHA (Audio Hallucination Alignment)** framework.

## Model Description
AHA is a framework designed to mitigate hallucinations in Large Audio-Language Models (LALMs) by focusing on fine-grained temporal reasoning and counterfactual alignment. By leveraging counterfactual hard negative mining, the pipeline constructs high-quality preference data that forces models to distinguish strict acoustic evidence from linguistically plausible fabrications.

- **Paper:** [AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives](https://huggingface.co/papers/2512.24052)
- **GitHub Repository:** [https://github.com/LLM-VLM-GSL/AHA](https://github.com/LLM-VLM-GSL/AHA)
- **Base Model:** [Qwen/Qwen2.5-Omni-7B](https://huggingface.co/Qwen/Qwen2.5-Omni-7B)

## Intended Use
- **Primary Task:** Audio reasoning and reducing hallucinations in audio-to-text tasks.
- **Languages Supported:** All languages supported by the base Qwen2.5-Omni-7B model.

## Sample Usage

You can load this model using the `peft` and `transformers` libraries. Note that `librosa` is required for audio loading in this example.

```python
import torch
import librosa
from transformers import Qwen2_5OmniThinkerForConditionalGeneration, Qwen2_5OmniProcessor
from peft import PeftModel

device = "cuda" if torch.cuda.is_available() else "cpu"
model_id = "Qwen/Qwen2.5-Omni-7B"
adapter_id = "ASU-GSL/Qwen-Audio-AHA"

# Load base model and processor
processor = Qwen2_5OmniProcessor.from_pretrained(model_id)
model = Qwen2_5OmniThinkerForConditionalGeneration.from_pretrained(
    model_id, torch_dtype="auto", device_map="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, adapter_id)

# Load Audio
# Replace "example.wav" with the path to your audio file
audio, _ = librosa.load("example.wav", sr=processor.feature_extractor.sampling_rate)
prompt = "<|audio|>
Describe the temporal order of events in this audio."
inputs = processor(text=prompt, audios=audio, return_tensors="pt").to(device)

# Generate
generate_ids = model.generate(**inputs, max_new_tokens=256)
print(processor.batch_decode(generate_ids, skip_special_tokens=True)[0])
```

## Citation
```bibtex
@article{chen2025aha,
  title={AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives},
  author={Chen, Yanxi and Zhu, Wenhui and Chen, Xiwen and Wang, Zhipeng and Li, Xin and Qiu, Peijie and Wang, Hao and Dong, Xuanzhao and Xiong, Yujian and Schneider, Anderson and others},
  journal={arXiv preprint arXiv:2512.24052},
  year={2025}
}
```