Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,19 @@
|
|
| 1 |
---
|
| 2 |
license: llama2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: llama2
|
| 3 |
+
datasets:
|
| 4 |
+
- wangyueqian/HawkEye-IT
|
| 5 |
+
- wangyueqian/InternVid-G
|
| 6 |
+
- OpenGVLab/VideoChat2-IT
|
| 7 |
+
language:
|
| 8 |
+
- en
|
| 9 |
+
pipeline_tag: visual-question-answering
|
| 10 |
---
|
| 11 |
+
# <div style="display: flex; align-items: center;"> <img src="https://github.com/yellow-binary-tree/HawkEye/blob/main/assets/hawk.png?raw=True" alt="logo" width="50" height="50" style="margin: 0 10;"> <span style="margin: 10 10;"> HawkEye: Training Video-Text LLMs for Grounding Text in Videos</span> </div>
|
| 12 |
+
|
| 13 |
+
This repo provides the checkpoint of HawkEye, and our implementation of VideoChat2.
|
| 14 |
+
|
| 15 |
+
`videochat2-stage3-our_impl.pth` is the chekepoint of our reproduce of VideoChat2. You can use it as an substitution of `hawkeye.pth`.
|
| 16 |
+
- The difference between it and HawkEye is: not trained with data from [InternVid-G](https://github.com/yellow-binary-tree/HawkEye/blob/main/internvid_g/README.md).
|
| 17 |
+
- The difference between it and the original implementation of VideoChat2 is: the visual encoder is frozen, and not trained with image data from [VideoChat2-IT](https://github.com/OpenGVLab/Ask-Anything/blob/main/video_chat2/DATA.md)
|
| 18 |
+
|
| 19 |
+
For more details please refer to our [paper](https://arxiv.org/abs/2403.10228) and [github](https://github.com/yellow-binary-tree/HawkEye).
|