Improve model card: add project page, specify task, add library name

#4
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +16 -10
README.md CHANGED
@@ -1,13 +1,14 @@
1
  ---
2
- pipeline_tag: image-text-to-text
3
- license: apache-2.0
4
  base_model:
5
  - Qwen/Qwen2.5-7B-Instruct
 
 
6
  language:
7
  - en
8
  - zh
9
- datasets:
10
- - HuggingFaceFV/finevideo
 
11
  ---
12
 
13
  # Ola-7B
@@ -19,9 +20,10 @@ Based on Qwen2.5 language model, it is trained on text, image, video and audio d
19
 
20
  Ola offers an on-demand solution to seamlessly and efficiently process visual inputs with arbitrary spatial sizes and temporal lengths.
21
 
22
- - **Repository:** https://github.com/Ola-Omni/Ola
23
- - **Languages:** English, Chinese
24
- - **Paper:** https://huggingface.co/papers/2502.04328
 
25
 
26
  ## Use
27
 
@@ -177,11 +179,15 @@ def ola_inference(multimodal, audio_path):
177
  else:
178
  qs = ''
179
  if USE_SPEECH and audio_path:
180
- qs = DEFAULT_IMAGE_TOKEN + "\n" + "User's question in speech: " + DEFAULT_SPEECH_TOKEN + '\n'
 
 
181
  elif USE_SPEECH:
182
- qs = DEFAULT_SPEECH_TOKEN + DEFAULT_IMAGE_TOKEN + "\n" + qs
 
183
  else:
184
- qs = DEFAULT_IMAGE_TOKEN + "\n" + qs
 
185
 
186
  conv = conv_templates[conv_mode].copy()
187
  conv.append_message(conv.roles[0], qs)
 
1
  ---
 
 
2
  base_model:
3
  - Qwen/Qwen2.5-7B-Instruct
4
+ datasets:
5
+ - HuggingFaceFV/finevideo
6
  language:
7
  - en
8
  - zh
9
+ license: apache-2.0
10
+ pipeline_tag: multi-modality
11
+ library_name: transformers
12
  ---
13
 
14
  # Ola-7B
 
20
 
21
  Ola offers an on-demand solution to seamlessly and efficiently process visual inputs with arbitrary spatial sizes and temporal lengths.
22
 
23
+ - **Project Page:** https://ola-omni.github.io/
24
+ - **Repository:** https://github.com/Ola-Omni/Ola
25
+ - **Languages:** English, Chinese
26
+ - **Paper:** https://huggingface.co/papers/2502.04328
27
 
28
  ## Use
29
 
 
179
  else:
180
  qs = ''
181
  if USE_SPEECH and audio_path:
182
+ qs = DEFAULT_IMAGE_TOKEN + "
183
+ " + "User's question in speech: " + DEFAULT_SPEECH_TOKEN + '
184
+ '
185
  elif USE_SPEECH:
186
+ qs = DEFAULT_SPEECH_TOKEN + DEFAULT_IMAGE_TOKEN + "
187
+ " + qs
188
  else:
189
+ qs = DEFAULT_IMAGE_TOKEN + "
190
+ " + qs
191
 
192
  conv = conv_templates[conv_mode].copy()
193
  conv.append_message(conv.roles[0], qs)