AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives
Paper
•
2512.24052
•
Published
This repository contains the official LoRA adapter for Qwen2.5-Omni-7B (Thinker), fine-tuned using the AHA (Audio Hallucination Alignment) framework.
AHA is a framework designed to mitigate hallucinations in Large Audio-Language Models (LALMs) by focusing on fine-grained temporal reasoning and counterfactual alignment. By leveraging counterfactual hard negative mining, the pipeline constructs high-quality preference data that forces models to distinguish strict acoustic evidence from linguistically plausible fabrications.
You can load this model using the peft and transformers libraries. Note that librosa is required for audio loading in this example.
import torch
import librosa
from transformers import Qwen2_5OmniThinkerForConditionalGeneration, Qwen2_5OmniProcessor
from peft import PeftModel
device = "cuda" if torch.cuda.is_available() else "cpu"
model_id = "Qwen/Qwen2.5-Omni-7B"
adapter_id = "ASU-GSL/Qwen-Audio-AHA"
# Load base model and processor
processor = Qwen2_5OmniProcessor.from_pretrained(model_id)
model = Qwen2_5OmniThinkerForConditionalGeneration.from_pretrained(
model_id, torch_dtype="auto", device_map="auto"
)
# Load LoRA adapter
model = PeftModel.from_pretrained(model, adapter_id)
# Load Audio
# Replace "example.wav" with the path to your audio file
audio, _ = librosa.load("example.wav", sr=processor.feature_extractor.sampling_rate)
prompt = "<|audio|>
Describe the temporal order of events in this audio."
inputs = processor(text=prompt, audios=audio, return_tensors="pt").to(device)
# Generate
generate_ids = model.generate(**inputs, max_new_tokens=256)
print(processor.batch_decode(generate_ids, skip_special_tokens=True)[0])
@article{chen2025aha,
title={AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives},
author={Chen, Yanxi and Zhu, Wenhui and Chen, Xiwen and Wang, Zhipeng and Li, Xin and Qiu, Peijie and Wang, Hao and Dong, Xuanzhao and Xiong, Yujian and Schneider, Anderson and others},
journal={arXiv preprint arXiv:2512.24052},
year={2025}
}
Base model
Qwen/Qwen2.5-Omni-7B