--- base_model: Qwen/Qwen2.5-Omni-7B datasets: - ASU-GSL/AHA library_name: peft license: apache-2.0 pipeline_tag: audio-text-to-text tags: - lora - qwen2.5-omni - multimodal - audio --- # Qwen-Audio-AHA (LoRA Adapter) This repository contains the official LoRA adapter for **Qwen2.5-Omni-7B** (Thinker), fine-tuned using the **AHA (Audio Hallucination Alignment)** framework. ## Model Description AHA is a framework designed to mitigate hallucinations in Large Audio-Language Models (LALMs) by focusing on fine-grained temporal reasoning and counterfactual alignment. By leveraging counterfactual hard negative mining, the pipeline constructs high-quality preference data that forces models to distinguish strict acoustic evidence from linguistically plausible fabrications. - **Paper:** [AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives](https://huggingface.co/papers/2512.24052) - **GitHub Repository:** [https://github.com/LLM-VLM-GSL/AHA](https://github.com/LLM-VLM-GSL/AHA) - **Base Model:** [Qwen/Qwen2.5-Omni-7B](https://huggingface.co/Qwen/Qwen2.5-Omni-7B) ## Intended Use - **Primary Task:** Audio reasoning and reducing hallucinations in audio-to-text tasks. - **Languages Supported:** All languages supported by the base Qwen2.5-Omni-7B model. ## Sample Usage You can load this model using the `peft` and `transformers` libraries. Note that `librosa` is required for audio loading in this example. ```python import torch import librosa from transformers import Qwen2_5OmniThinkerForConditionalGeneration, Qwen2_5OmniProcessor from peft import PeftModel device = "cuda" if torch.cuda.is_available() else "cpu" model_id = "Qwen/Qwen2.5-Omni-7B" adapter_id = "ASU-GSL/Qwen-Audio-AHA" # Load base model and processor processor = Qwen2_5OmniProcessor.from_pretrained(model_id) model = Qwen2_5OmniThinkerForConditionalGeneration.from_pretrained( model_id, torch_dtype="auto", device_map="auto" ) # Load LoRA adapter model = PeftModel.from_pretrained(model, adapter_id) # Load Audio # Replace "example.wav" with the path to your audio file audio, _ = librosa.load("example.wav", sr=processor.feature_extractor.sampling_rate) prompt = "<|audio|> Describe the temporal order of events in this audio." inputs = processor(text=prompt, audios=audio, return_tensors="pt").to(device) # Generate generate_ids = model.generate(**inputs, max_new_tokens=256) print(processor.batch_decode(generate_ids, skip_special_tokens=True)[0]) ``` ## Citation ```bibtex @article{chen2025aha, title={AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives}, author={Chen, Yanxi and Zhu, Wenhui and Chen, Xiwen and Wang, Zhipeng and Li, Xin and Qiu, Peijie and Wang, Hao and Dong, Xuanzhao and Xiong, Yujian and Schneider, Anderson and others}, journal={arXiv preprint arXiv:2512.24052}, year={2025} } ```