lysanderism
/

TimeAudio

Large Audio Language Models

Model card Files Files and versions

chukewang commited on Nov 13, 2025

Commit

4511f87

·

1 Parent(s): 3901199

Init: add images via LFS

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ metrics:
 - accuracy
 ---
-## 🚀🚀TimeAudio: Bridging Temporal Gaps in Large Audio-Language
 <div style='display:flex; gap: 0.25rem; '>
 <a href='https://arxiv.org/pdf/.pdf'><img src='https://img.shields.io/badge/paper-PDF-green'></a>
@@ -47,8 +47,8 @@ You need to use the following dependencies:
 2. Download [whisper large v2](https://huggingface.co/openai/whisper-large-v2/tree/main) to ```whisper_path```.
 3. Download [Fine-tuned BEATs_iter3+ (AS2M) (cpt2)](https://valle.blob.core.windows.net/share/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt?sv=2020-08-04&st=2023-03-01T07%3A51%3A05Z&se=2033-03-02T07%3A51%3A00Z&sr=c&sp=rl&sig=QJXmSJG9DbMKf48UDIU1MfzIro8HQOf3sqlNXiflY1I%3D) to `beats_path`.
 4. Download [vicuna 7B v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5/tree/main) to ```vicuna_path```.
-5. Download [salmonn-7b v0](https://huggingface.co/tsinghua-ee/SALMONN-7B/blob/main/salmonn_7b_v0.pth) to ```ckpt_path```.
-6. Running with ```python3 cli_inference.py --ckpt_path xxx --whisper_path xxx --beats_path xxx --vicuna_path xxx``` to start cli inference. Please make sure your GPU has more than 40G of memory. If your GPU does not have enough memory (e.g. only 24G), you can quantize the model using the `--low_resource` parameter to reduce the memory usage, and can reduce the LoRA scaling factor to maintain the model's emergent abilities, e.g. `--lora_alpha=28`.
 ## Launch a QA
@@ -57,7 +57,7 @@ You need to use the following dependencies:
 ## Citation
-If you find SALMONN great and useful, please cite our paper:
 ```
 @article{,
   title={TimeAudio: Bridging Temporal Gaps in Large Audio-Language Models},

 - accuracy
 ---
+## 🚀🚀 TimeAudio: Bridging Temporal Gaps in Large Audio-Language
 <div style='display:flex; gap: 0.25rem; '>
 <a href='https://arxiv.org/pdf/.pdf'><img src='https://img.shields.io/badge/paper-PDF-green'></a>
 2. Download [whisper large v2](https://huggingface.co/openai/whisper-large-v2/tree/main) to ```whisper_path```.
 3. Download [Fine-tuned BEATs_iter3+ (AS2M) (cpt2)](https://valle.blob.core.windows.net/share/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt?sv=2020-08-04&st=2023-03-01T07%3A51%3A05Z&se=2033-03-02T07%3A51%3A00Z&sr=c&sp=rl&sig=QJXmSJG9DbMKf48UDIU1MfzIro8HQOf3sqlNXiflY1I%3D) to `beats_path`.
 4. Download [vicuna 7B v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5/tree/main) to ```vicuna_path```.
+5. Download [timeaudio](https://huggingface.co/lysanderism/TimeAudio/timeaudio.pth) to ```ckpt_path```.
+6. Running with ```python3 cli_inference.py --ckpt_path xxx --whisper_path xxx --beats_path xxx --vicuna_path xxx``` to start cli inference. Please make sure your GPU has more than 40G of memory. If your GPU does not have enough memory (e.g. only 24G), you can quantize the model using the `--low_resource` parameter to reduce the memory usage.
 ## Launch a QA
 ## Citation
+If you find TimeAudio great and useful, please cite our paper:
 ```
 @article{,
   title={TimeAudio: Bridging Temporal Gaps in Large Audio-Language Models},