chukewang commited on
Commit
2d5c348
·
1 Parent(s): 8673bf6
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -37,7 +37,7 @@ TimeAudio is based on the fundamental architecture of SALMONN. Specifically, Tim
37
 
38
  Compared with traditional speech and audio processing tasks such as speech recognition and audio caption, Example of failed cases by Qwen2-Audio and Qwen2-Audio-R1 on fine-grained tasks that require both semantics and timestamps as output.
39
 
40
- <div align=center><img src="img/case.png" height="100%" width="90%"/></div>
41
 
42
  ## How to inference in CLI
43
 
@@ -47,7 +47,7 @@ You need to use the following dependencies:
47
  2. Download [whisper large v2](https://huggingface.co/openai/whisper-large-v2/tree/main) to ```whisper_path```.
48
  3. Download [Fine-tuned BEATs_iter3+ (AS2M) (cpt2)](https://valle.blob.core.windows.net/share/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt?sv=2020-08-04&st=2023-03-01T07%3A51%3A05Z&se=2033-03-02T07%3A51%3A00Z&sr=c&sp=rl&sig=QJXmSJG9DbMKf48UDIU1MfzIro8HQOf3sqlNXiflY1I%3D) to `beats_path`.
49
  4. Download [vicuna 7B v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5/tree/main) to ```vicuna_path```.
50
- 5. Download [timeaudio](https://huggingface.co/lysanderism/TimeAudio/blob/main) to ```ckpt_path```.
51
  6. Running with ```python3 cli_inference.py --ckpt_path xxx --whisper_path xxx --beats_path xxx --vicuna_path xxx``` to start cli inference. Please make sure your GPU has more than 40G of memory. If your GPU does not have enough memory (e.g. only 24G), you can quantize the model using the `--low_resource` parameter to reduce the memory usage.
52
 
53
  ## Launch a QA
 
37
 
38
  Compared with traditional speech and audio processing tasks such as speech recognition and audio caption, Example of failed cases by Qwen2-Audio and Qwen2-Audio-R1 on fine-grained tasks that require both semantics and timestamps as output.
39
 
40
+ <div align=center><img src="img/case.png" height="100%" width="80%"/></div>
41
 
42
  ## How to inference in CLI
43
 
 
47
  2. Download [whisper large v2](https://huggingface.co/openai/whisper-large-v2/tree/main) to ```whisper_path```.
48
  3. Download [Fine-tuned BEATs_iter3+ (AS2M) (cpt2)](https://valle.blob.core.windows.net/share/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt?sv=2020-08-04&st=2023-03-01T07%3A51%3A05Z&se=2033-03-02T07%3A51%3A00Z&sr=c&sp=rl&sig=QJXmSJG9DbMKf48UDIU1MfzIro8HQOf3sqlNXiflY1I%3D) to `beats_path`.
49
  4. Download [vicuna 7B v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5/tree/main) to ```vicuna_path```.
50
+ 5. Download [timeaudio](https://huggingface.co/lysanderism/TimeAudio/tree/main) to ```ckpt_path```.
51
  6. Running with ```python3 cli_inference.py --ckpt_path xxx --whisper_path xxx --beats_path xxx --vicuna_path xxx``` to start cli inference. Please make sure your GPU has more than 40G of memory. If your GPU does not have enough memory (e.g. only 24G), you can quantize the model using the `--low_resource` parameter to reduce the memory usage.
52
 
53
  ## Launch a QA