Qwen-Audio-AHA (LoRA Adapter)

This repository contains the official LoRA adapter for Qwen2.5-Omni-7B (Thinker), fine-tuned using the AHA (Audio Hallucination Alignment) framework.

Model Description

AHA is a framework designed to mitigate hallucinations in Large Audio-Language Models (LALMs) by focusing on fine-grained temporal reasoning and counterfactual alignment. By leveraging counterfactual hard negative mining, the pipeline constructs high-quality preference data that forces models to distinguish strict acoustic evidence from linguistically plausible fabrications.

Paper: AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives
GitHub Repository: https://github.com/LLM-VLM-GSL/AHA
Base Model: Qwen/Qwen2.5-Omni-7B

Intended Use

Primary Task: Audio reasoning and reducing hallucinations in audio-to-text tasks.
Languages Supported: All languages supported by the base Qwen2.5-Omni-7B model.

Sample Usage

You can load this model using the peft and transformers libraries. Note that librosa is required for audio loading in this example.

import torch
import librosa
from transformers import Qwen2_5OmniThinkerForConditionalGeneration, Qwen2_5OmniProcessor
from peft import PeftModel

device = "cuda" if torch.cuda.is_available() else "cpu"
model_id = "Qwen/Qwen2.5-Omni-7B"
adapter_id = "ASU-GSL/Qwen-Audio-AHA"

# Load base model and processor
processor = Qwen2_5OmniProcessor.from_pretrained(model_id)
model = Qwen2_5OmniThinkerForConditionalGeneration.from_pretrained(
    model_id, torch_dtype="auto", device_map="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, adapter_id)

# Load Audio
# Replace "example.wav" with the path to your audio file
audio, _ = librosa.load("example.wav", sr=processor.feature_extractor.sampling_rate)
prompt = "<|audio|>
Describe the temporal order of events in this audio."
inputs = processor(text=prompt, audios=audio, return_tensors="pt").to(device)

# Generate
generate_ids = model.generate(**inputs, max_new_tokens=256)
print(processor.batch_decode(generate_ids, skip_special_tokens=True)[0])

Citation

@article{chen2025aha,
  title={AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives},
  author={Chen, Yanxi and Zhu, Wenhui and Chen, Xiwen and Wang, Zhipeng and Li, Xin and Qiu, Peijie and Wang, Hao and Dong, Xuanzhao and Xiong, Yujian and Schneider, Anderson and others},
  journal={arXiv preprint arXiv:2512.24052},
  year={2025}
}

Downloads last month: 6

Inference Providers NEW

Audio-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ASU-GSL/Qwen-Audio-AHA

Base model

Qwen/Qwen2.5-Omni-7B

Adapter

(20)

this model

Dataset used to train ASU-GSL/Qwen-Audio-AHA

Paper for ASU-GSL/Qwen-Audio-AHA

AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives

Paper • 2512.24052 • Published Dec 30, 2025