Update README.md
Browse files
README.md
CHANGED
|
@@ -23,4 +23,21 @@ pipeline_tag: image-text-to-text
|
|
| 23 |
| **Precision** | bfloat16 |
|
| 24 |
|
| 25 |
> [!note]
|
| 26 |
-
> The open dataset image-text response will be updated soon.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
| **Precision** | bfloat16 |
|
| 24 |
|
| 25 |
> [!note]
|
| 26 |
+
> The open dataset image-text response will be updated soon.
|
| 27 |
+
|
| 28 |
+
## References
|
| 29 |
+
|
| 30 |
+
- **DocVLM: Make Your VLM an Efficient Reader**
|
| 31 |
+
[https://arxiv.org/pdf/2412.08746v1](https://arxiv.org/pdf/2412.08746v1)
|
| 32 |
+
|
| 33 |
+
- **YaRN: Efficient Context Window Extension of Large Language Models**
|
| 34 |
+
[https://arxiv.org/pdf/2309.00071](https://arxiv.org/pdf/2309.00071)
|
| 35 |
+
|
| 36 |
+
- **Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution**
|
| 37 |
+
[https://arxiv.org/pdf/2409.12191](https://arxiv.org/pdf/2409.12191)
|
| 38 |
+
|
| 39 |
+
- **Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond**
|
| 40 |
+
[https://arxiv.org/pdf/2308.12966](https://arxiv.org/pdf/2308.12966)
|
| 41 |
+
|
| 42 |
+
- **A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy**
|
| 43 |
+
[https://arxiv.org/pdf/2412.02210](https://arxiv.org/pdf/2412.02210)
|