Add pipeline tag, library name and paper link

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +12 -6
README.md CHANGED
@@ -1,25 +1,29 @@
1
  ---
2
- license: apache-2.0
 
3
  datasets:
4
  - chenjoya/Live-CC-5M
5
  language:
6
  - en
7
- base_model:
8
- - Qwen/Qwen2-VL-7B
9
  tags:
10
  - qwen_vl
11
  - video
12
  - real-time
13
  - multimodal
14
  - LLM
 
 
15
  ---
 
16
  # LiveCC-7B-Base
17
 
18
  ## Introduction
19
 
20
- We introduce LiveCC, the first video LLM capable of real-time commentary, trained with a novel video-ASR streaming method, SOTA on both streaming and offline benchmarks.
21
 
22
  - Project Page: https://showlab.github.io/livecc
 
23
 
24
  > [!Important]
25
  > This is the Base model. The base model is at [LiveCC-7B-Instruct](https://huggingface.co/chenjoya/LiveCC-7B-Instruct).
@@ -152,7 +156,8 @@ class LiveCCDemoInfer:
152
  texts = self.processor.apply_chat_template([message], tokenize=False, add_generation_prompt=True, return_tensors='pt')
153
  past_ids = state.get('past_ids', None)
154
  if past_ids is not None:
155
- texts = '<|im_end|>\n' + texts[self.system_prompt_offset:]
 
156
  inputs = self.processor(
157
  text=texts,
158
  images=None,
@@ -274,7 +279,8 @@ class LiveCCDemoInfer:
274
  image_inputs, video_inputs = process_vision_info(conversation)
275
  texts = self.processor.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True, return_tensors='pt')
276
  if past_ids is not None:
277
- texts = '<|im_end|>\n' + texts[self.system_prompt_offset:]
 
278
  inputs = self.processor(
279
  text=texts,
280
  images=image_inputs,
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen2-VL-7B
4
  datasets:
5
  - chenjoya/Live-CC-5M
6
  language:
7
  - en
8
+ license: apache-2.0
 
9
  tags:
10
  - qwen_vl
11
  - video
12
  - real-time
13
  - multimodal
14
  - LLM
15
+ pipeline_tag: video-text-to-text
16
+ library_name: transformers
17
  ---
18
+
19
  # LiveCC-7B-Base
20
 
21
  ## Introduction
22
 
23
+ We introduce LiveCC, the first video LLM capable of real-time commentary, trained with a novel video-ASR streaming method, achieving SOTA on both streaming and offline benchmarks. The model takes video and text as input and generates text as output.
24
 
25
  - Project Page: https://showlab.github.io/livecc
26
+ - Paper: [LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale](https://huggingface.co/papers/2504.16030)
27
 
28
  > [!Important]
29
  > This is the Base model. The base model is at [LiveCC-7B-Instruct](https://huggingface.co/chenjoya/LiveCC-7B-Instruct).
 
156
  texts = self.processor.apply_chat_template([message], tokenize=False, add_generation_prompt=True, return_tensors='pt')
157
  past_ids = state.get('past_ids', None)
158
  if past_ids is not None:
159
+ texts = '<|im_end|>
160
+ ' + texts[self.system_prompt_offset:]
161
  inputs = self.processor(
162
  text=texts,
163
  images=None,
 
279
  image_inputs, video_inputs = process_vision_info(conversation)
280
  texts = self.processor.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True, return_tensors='pt')
281
  if past_ids is not None:
282
+ texts = '<|im_end|>
283
+ ' + texts[self.system_prompt_offset:]
284
  inputs = self.processor(
285
  text=texts,
286
  images=image_inputs,