chukewang commited on
Commit
574bfcc
·
1 Parent(s): 2d5c348
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -31,13 +31,13 @@ To address this, we introduce TimeAudio, a novel method that empowers LALMs to c
31
 
32
  TimeAudio is based on the fundamental architecture of SALMONN. Specifically, TimeAudio is consists of four components: a sliding audio encoder, a window Q-former, a segment-level token merging module, and an LLM to process raw audio.
33
 
34
- <div align=center><img src="img/overview.png" height="100%" width="90%"/></div>
35
 
36
  ## Compare
37
 
38
  Compared with traditional speech and audio processing tasks such as speech recognition and audio caption, Example of failed cases by Qwen2-Audio and Qwen2-Audio-R1 on fine-grained tasks that require both semantics and timestamps as output.
39
 
40
- <div align=center><img src="img/case.png" height="100%" width="80%"/></div>
41
 
42
  ## How to inference in CLI
43
 
 
31
 
32
  TimeAudio is based on the fundamental architecture of SALMONN. Specifically, TimeAudio is consists of four components: a sliding audio encoder, a window Q-former, a segment-level token merging module, and an LLM to process raw audio.
33
 
34
+ <div align=center><img src="img/overview.png" height="100%" width="92%"/></div>
35
 
36
  ## Compare
37
 
38
  Compared with traditional speech and audio processing tasks such as speech recognition and audio caption, Example of failed cases by Qwen2-Audio and Qwen2-Audio-R1 on fine-grained tasks that require both semantics and timestamps as output.
39
 
40
+ <div align=center><img src="img/case.png" height="100%" width="75%"/></div>
41
 
42
  ## How to inference in CLI
43