Add pipeline tag, library name, and improve model card

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +33 -12
README.md CHANGED
@@ -1,34 +1,43 @@
1
  ---
2
- license: apache-2.0
3
  base_model: Qwen/Qwen2.5-Omni-7B
 
 
 
 
 
4
  tags:
5
  - lora
6
  - qwen2.5-omni
7
  - multimodal
8
  - audio
9
- datasets:
10
- - ASU-GSL/AHA
11
  ---
12
 
13
- # Qwen2.5-Omni LoRA Adapter
 
 
14
 
15
  ## Model Description
16
- This is a LoRA adapter for **Qwen2.5-Omni-7B** **Thinker**, fine-tuned to reduce audio hallucination.
17
 
18
- Qwen2.5-Omni is a foundational multimodal model capable of seamless audio-to-audio and audio-to-text interactions. This adapter enhances the model's audio reasoning capability by reducing model hallucination.
 
 
19
 
20
  ## Intended Use
21
- - **Primary Task:** Audio reasoning.
22
- - **Languages Supported:** All languages supported by Qwen2.5-Omni-7B.
23
 
24
- ## How to Load
25
- You can load this model using the `peft` and `transformers` libraries:
 
26
 
27
  ```python
 
 
28
  from transformers import Qwen2_5OmniThinkerForConditionalGeneration, Qwen2_5OmniProcessor
29
  from peft import PeftModel
30
- import torch
31
 
 
32
  model_id = "Qwen/Qwen2.5-Omni-7B"
33
  adapter_id = "ASU-GSL/Qwen-Audio-AHA"
34
 
@@ -40,9 +49,21 @@ model = Qwen2_5OmniThinkerForConditionalGeneration.from_pretrained(
40
 
41
  # Load LoRA adapter
42
  model = PeftModel.from_pretrained(model, adapter_id)
43
- ```
44
 
 
 
 
 
 
 
 
 
 
 
45
  ```
 
 
 
46
  @article{chen2025aha,
47
  title={AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives},
48
  author={Chen, Yanxi and Zhu, Wenhui and Chen, Xiwen and Wang, Zhipeng and Li, Xin and Qiu, Peijie and Wang, Hao and Dong, Xuanzhao and Xiong, Yujian and Schneider, Anderson and others},
 
1
  ---
 
2
  base_model: Qwen/Qwen2.5-Omni-7B
3
+ datasets:
4
+ - ASU-GSL/AHA
5
+ library_name: peft
6
+ license: apache-2.0
7
+ pipeline_tag: audio-text-to-text
8
  tags:
9
  - lora
10
  - qwen2.5-omni
11
  - multimodal
12
  - audio
 
 
13
  ---
14
 
15
+ # Qwen-Audio-AHA (LoRA Adapter)
16
+
17
+ This repository contains the official LoRA adapter for **Qwen2.5-Omni-7B** (Thinker), fine-tuned using the **AHA (Audio Hallucination Alignment)** framework.
18
 
19
  ## Model Description
20
+ AHA is a framework designed to mitigate hallucinations in Large Audio-Language Models (LALMs) by focusing on fine-grained temporal reasoning and counterfactual alignment. By leveraging counterfactual hard negative mining, the pipeline constructs high-quality preference data that forces models to distinguish strict acoustic evidence from linguistically plausible fabrications.
21
 
22
+ - **Paper:** [AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives](https://huggingface.co/papers/2512.24052)
23
+ - **GitHub Repository:** [https://github.com/LLM-VLM-GSL/AHA](https://github.com/LLM-VLM-GSL/AHA)
24
+ - **Base Model:** [Qwen/Qwen2.5-Omni-7B](https://huggingface.co/Qwen/Qwen2.5-Omni-7B)
25
 
26
  ## Intended Use
27
+ - **Primary Task:** Audio reasoning and reducing hallucinations in audio-to-text tasks.
28
+ - **Languages Supported:** All languages supported by the base Qwen2.5-Omni-7B model.
29
 
30
+ ## Sample Usage
31
+
32
+ You can load this model using the `peft` and `transformers` libraries. Note that `librosa` is required for audio loading in this example.
33
 
34
  ```python
35
+ import torch
36
+ import librosa
37
  from transformers import Qwen2_5OmniThinkerForConditionalGeneration, Qwen2_5OmniProcessor
38
  from peft import PeftModel
 
39
 
40
+ device = "cuda" if torch.cuda.is_available() else "cpu"
41
  model_id = "Qwen/Qwen2.5-Omni-7B"
42
  adapter_id = "ASU-GSL/Qwen-Audio-AHA"
43
 
 
49
 
50
  # Load LoRA adapter
51
  model = PeftModel.from_pretrained(model, adapter_id)
 
52
 
53
+ # Load Audio
54
+ # Replace "example.wav" with the path to your audio file
55
+ audio, _ = librosa.load("example.wav", sr=processor.feature_extractor.sampling_rate)
56
+ prompt = "<|audio|>
57
+ Describe the temporal order of events in this audio."
58
+ inputs = processor(text=prompt, audios=audio, return_tensors="pt").to(device)
59
+
60
+ # Generate
61
+ generate_ids = model.generate(**inputs, max_new_tokens=256)
62
+ print(processor.batch_decode(generate_ids, skip_special_tokens=True)[0])
63
  ```
64
+
65
+ ## Citation
66
+ ```bibtex
67
  @article{chen2025aha,
68
  title={AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives},
69
  author={Chen, Yanxi and Zhu, Wenhui and Chen, Xiwen and Wang, Zhipeng and Li, Xin and Qiu, Peijie and Wang, Hao and Dong, Xuanzhao and Xiong, Yujian and Schneider, Anderson and others},