Commit
·
41e7a27
1
Parent(s):
337fe46
Update README.md
Browse files
README.md
CHANGED
|
@@ -4,7 +4,9 @@ language:
|
|
| 4 |
- en
|
| 5 |
---
|
| 6 |
|
| 7 |
-
**VLE** (**V**isual-**L**anguage **E**ncoder) is an image-text multimodal understanding model built on the pre-trained text and image encoders.
|
|
|
|
|
|
|
| 8 |
|
| 9 |
For more details see [https://github.com/iflytek/VLE](https://github.com/iflytek/VLE).
|
| 10 |
|
|
|
|
| 4 |
- en
|
| 5 |
---
|
| 6 |
|
| 7 |
+
**VLE** (**V**isual-**L**anguage **E**ncoder) is an image-text multimodal understanding model built on the pre-trained text and image encoders.
|
| 8 |
+
It can be used for multimodal discriminative tasks such as visual question answering and image-text retrieval.
|
| 9 |
+
Especially on the visual commonsense reasoning (VCR) task, which requires high-level language understanding and reasoning skills, VLE achieves significant improvements.
|
| 10 |
|
| 11 |
For more details see [https://github.com/iflytek/VLE](https://github.com/iflytek/VLE).
|
| 12 |
|