Update README.md
Browse files
README.md
CHANGED
|
@@ -398,14 +398,9 @@ messages = [
|
|
| 398 |
|
| 399 |
## Limitations
|
| 400 |
|
| 401 |
-
|
| 402 |
-
|
| 403 |
-
|
| 404 |
-
2. Data timeliness: Our image dataset is **updated until June 2023**, and information subsequent to this date may not be covered.
|
| 405 |
-
3. Constraints in Individuals and Intellectual Property (IP): The model's capacity to recognize specific individuals or IPs is limited, potentially failing to comprehensively cover all well-known personalities or brands.
|
| 406 |
-
4. Limited Capacity for Complex Instruction: When faced with intricate multi-step instructions, the model's understanding and execution capabilities require enhancement.
|
| 407 |
-
5. Insufficient Counting Accuracy: Particularly in complex scenes, the accuracy of object counting is not high, necessitating further improvements.
|
| 408 |
-
6. Weak Spatial Reasoning Skills: Especially in 3D spaces, the model's inference of object positional relationships is inadequate, making it difficult to precisely judge the relative positions of objects.
|
| 409 |
|
| 410 |
These limitations serve as ongoing directions for model optimization and improvement, and we are committed to continually enhancing the model's performance and scope of application.
|
| 411 |
|
|
|
|
| 398 |
|
| 399 |
## Limitations
|
| 400 |
|
| 401 |
+
- This model is starting from Qwen2-VL-7B-Base, so it may have limitations mentioned in https://huggingface.co/Qwen/Qwen2-VL-7B.
|
| 402 |
+
- This model is trained only with streaming frame-words paradigm, thus it may be only capable for real-time video commentary.
|
| 403 |
+
- The training ASR data is from YouTube CC, which has well-known low quality, so its formatting is not good (e.g. cannot output punctuation).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 404 |
|
| 405 |
These limitations serve as ongoing directions for model optimization and improvement, and we are committed to continually enhancing the model's performance and scope of application.
|
| 406 |
|