chenjoya
/

LiveCC-7B-Instruct

Model card Files Files and versions

chenjoya commited on Apr 23, 2025

Commit

f942dd9

·

verified ·

1 Parent(s): 3c5ce67

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -207,9 +207,9 @@ for t in range(31):
 ## Limitations
 - This model is finetuned on LiveCC-7B-Base, which is starting from Qwen2-VL-7B-Base, so it may have limitations mentioned in https://huggingface.co/Qwen/Qwen2-VL-7B.
-- This model is trained only with streaming frame-words paradigm, thus it may be only capable for real-time video commentary.
-- The training ASR data is from YouTube CC, which has well-known low quality, so its formatting is not good (e.g. cannot output punctuation).
 These limitations serve as ongoing directions for model optimization and improvement, and we are committed to continually enhancing the model's performance and scope of application.
 ## Citation

 ## Limitations
 - This model is finetuned on LiveCC-7B-Base, which is starting from Qwen2-VL-7B-Base, so it may have limitations mentioned in https://huggingface.co/Qwen/Qwen2-VL-7B.
+- When performing real-time video commentary, it may appear collapse --- e.g., repeat pattern. If you encounter this situation, try to adjust repetition_penalty, streaming_eos_base_threshold, and streaming_eos_threshold_step.
+- This model only has a context window of 32768. Using more visual tokens per frame (e.g. 768 * 28 * 28) will have the best performance, but will shorten the working duration.
 These limitations serve as ongoing directions for model optimization and improvement, and we are committed to continually enhancing the model's performance and scope of application.
 ## Citation