chukewang
commited on
Commit
·
4511f87
1
Parent(s):
3901199
Init: add images via LFS
Browse files
README.md
CHANGED
|
@@ -15,7 +15,7 @@ metrics:
|
|
| 15 |
- accuracy
|
| 16 |
---
|
| 17 |
|
| 18 |
-
## 🚀🚀TimeAudio: Bridging Temporal Gaps in Large Audio-Language
|
| 19 |
|
| 20 |
<div style='display:flex; gap: 0.25rem; '>
|
| 21 |
<a href='https://arxiv.org/pdf/.pdf'><img src='https://img.shields.io/badge/paper-PDF-green'></a>
|
|
@@ -47,8 +47,8 @@ You need to use the following dependencies:
|
|
| 47 |
2. Download [whisper large v2](https://huggingface.co/openai/whisper-large-v2/tree/main) to ```whisper_path```.
|
| 48 |
3. Download [Fine-tuned BEATs_iter3+ (AS2M) (cpt2)](https://valle.blob.core.windows.net/share/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt?sv=2020-08-04&st=2023-03-01T07%3A51%3A05Z&se=2033-03-02T07%3A51%3A00Z&sr=c&sp=rl&sig=QJXmSJG9DbMKf48UDIU1MfzIro8HQOf3sqlNXiflY1I%3D) to `beats_path`.
|
| 49 |
4. Download [vicuna 7B v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5/tree/main) to ```vicuna_path```.
|
| 50 |
-
5. Download [
|
| 51 |
-
6. Running with ```python3 cli_inference.py --ckpt_path xxx --whisper_path xxx --beats_path xxx --vicuna_path xxx``` to start cli inference. Please make sure your GPU has more than 40G of memory. If your GPU does not have enough memory (e.g. only 24G), you can quantize the model using the `--low_resource` parameter to reduce the memory usage
|
| 52 |
|
| 53 |
## Launch a QA
|
| 54 |
|
|
@@ -57,7 +57,7 @@ You need to use the following dependencies:
|
|
| 57 |
|
| 58 |
|
| 59 |
## Citation
|
| 60 |
-
If you find
|
| 61 |
```
|
| 62 |
@article{,
|
| 63 |
title={TimeAudio: Bridging Temporal Gaps in Large Audio-Language Models},
|
|
|
|
| 15 |
- accuracy
|
| 16 |
---
|
| 17 |
|
| 18 |
+
## 🚀🚀 TimeAudio: Bridging Temporal Gaps in Large Audio-Language
|
| 19 |
|
| 20 |
<div style='display:flex; gap: 0.25rem; '>
|
| 21 |
<a href='https://arxiv.org/pdf/.pdf'><img src='https://img.shields.io/badge/paper-PDF-green'></a>
|
|
|
|
| 47 |
2. Download [whisper large v2](https://huggingface.co/openai/whisper-large-v2/tree/main) to ```whisper_path```.
|
| 48 |
3. Download [Fine-tuned BEATs_iter3+ (AS2M) (cpt2)](https://valle.blob.core.windows.net/share/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt?sv=2020-08-04&st=2023-03-01T07%3A51%3A05Z&se=2033-03-02T07%3A51%3A00Z&sr=c&sp=rl&sig=QJXmSJG9DbMKf48UDIU1MfzIro8HQOf3sqlNXiflY1I%3D) to `beats_path`.
|
| 49 |
4. Download [vicuna 7B v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5/tree/main) to ```vicuna_path```.
|
| 50 |
+
5. Download [timeaudio](https://huggingface.co/lysanderism/TimeAudio/timeaudio.pth) to ```ckpt_path```.
|
| 51 |
+
6. Running with ```python3 cli_inference.py --ckpt_path xxx --whisper_path xxx --beats_path xxx --vicuna_path xxx``` to start cli inference. Please make sure your GPU has more than 40G of memory. If your GPU does not have enough memory (e.g. only 24G), you can quantize the model using the `--low_resource` parameter to reduce the memory usage.
|
| 52 |
|
| 53 |
## Launch a QA
|
| 54 |
|
|
|
|
| 57 |
|
| 58 |
|
| 59 |
## Citation
|
| 60 |
+
If you find TimeAudio great and useful, please cite our paper:
|
| 61 |
```
|
| 62 |
@article{,
|
| 63 |
title={TimeAudio: Bridging Temporal Gaps in Large Audio-Language Models},
|