jaronfei commited on
Commit ·
1444aeb
1
Parent(s): b558c0f
fix typos
Browse files
README.md
CHANGED
|
@@ -4,7 +4,7 @@ license: mit
|
|
| 4 |
|
| 5 |
## Model Summary
|
| 6 |
|
| 7 |
-
Video-CCAM-4B is a lightweight Video-MLLM built on [Phi-3-
|
| 8 |
|
| 9 |
## Usage
|
| 10 |
|
|
@@ -27,12 +27,12 @@ Please refer to [Video-CCAM](https://github.com/QQ-MM/Video-CCAM) on inference a
|
|
| 27 |
|w/o subs|48.2|49.6|
|
| 28 |
|w subs|51.7|53.0|
|
| 29 |
|
| 30 |
-
### MVBench: 57.78 (
|
| 31 |
|
| 32 |
## Acknowledgement
|
| 33 |
|
| 34 |
-
* [xtuner](https://github.com/InternLM/xtuner): Video-CCAM-
|
| 35 |
-
* [Phi-3-
|
| 36 |
* [SigLIP SO400M](https://huggingface.co/google/siglip-so400m-patch14-384): Outstanding vision encoder developed by Google.
|
| 37 |
|
| 38 |
## License
|
|
|
|
| 4 |
|
| 5 |
## Model Summary
|
| 6 |
|
| 7 |
+
Video-CCAM-4B is a lightweight Video-MLLM built on [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) and [SigLIP SO400M](https://huggingface.co/google/siglip-so400m-patch14-384). **Note**: Here [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) refers to the previous version, which requires `git commit id ff07dc01615f8113924aed013115ab2abd32115b` to get the checkpoint.
|
| 8 |
|
| 9 |
## Usage
|
| 10 |
|
|
|
|
| 27 |
|w/o subs|48.2|49.6|
|
| 28 |
|w subs|51.7|53.0|
|
| 29 |
|
| 30 |
+
### MVBench: 57.78 (16 frames)
|
| 31 |
|
| 32 |
## Acknowledgement
|
| 33 |
|
| 34 |
+
* [xtuner](https://github.com/InternLM/xtuner): Video-CCAM-4B is trained using the xtuner framework. Thanks for their excellent works!
|
| 35 |
+
* [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct): Powerful language models developed by Microsoft.
|
| 36 |
* [SigLIP SO400M](https://huggingface.co/google/siglip-so400m-patch14-384): Outstanding vision encoder developed by Google.
|
| 37 |
|
| 38 |
## License
|