Update README.md
Browse files
README.md
CHANGED
|
@@ -50,7 +50,7 @@ Code for this model: https://github.com/linhuixiao/OneRef
|
|
| 50 |
This repository is the official Pytorch implementation for the paper [**OneRef: Unified One-tower Expression Grounding
|
| 51 |
and Segmentation with Mask Referring Modeling**](https://openreview.net/pdf?id=siPdcro6uD)
|
| 52 |
([Publication](https://proceedings.neurips.cc/paper_files/paper/2024/file/fcd812a51b8f8d05cfea22e3c9c4b369-Paper-Conference.pdf),
|
| 53 |
-
[Github Code](https://github.com/linhuixiao/OneRef), [HuggingFace model](https://huggingface.co/
|
| 54 |
of our preliminary work **HiVG** ([Publication](https://dl.acm.org/doi/abs/10.1145/3664647.3681071), [Paper](https://openreview.net/pdf?id=NMMyGy1kKZ),
|
| 55 |
[Code](https://github.com/linhuixiao/HiVG)) and **CLIP-VG** ([Publication](https://ieeexplore.ieee.org/abstract/document/10269126),
|
| 56 |
[Paper](https://arxiv.org/pdf/2305.08685), [Code](https://github.com/linhuixiao/CLIP-VG)).
|
|
@@ -67,7 +67,7 @@ Any kind discussions are welcomed!
|
|
| 67 |
:exclamation: During the code tidying process, some bugs may arise due to changes in variable names. If any issues occur, please raise them in the [issue page](https://github.com/linhuixiao/OneRef/issues), and I will try to resolve them timely.
|
| 68 |
|
| 69 |
- :fire: **Update on 2024/12/28: We conducted a Survey of Visual Grounding over the past decade, entitled "Towards Visual Grounding: A Survey" ([Paper](https://arxiv.org/pdf/2412.20206), [Project](https://github.com/linhuixiao/Awesome-Visual-Grounding)), Comments are welcome !!!**
|
| 70 |
-
- :fire: **Update on 2024/10/10: Our grounding work **OneRef** ([Paper](https://arxiv.org/abs/2410.08021), [Code](https://github.com/linhuixiao/OneRef), [Model](https://huggingface.co/
|
| 71 |
- **Update on 2024/07/16:** **Our grounding work HiVG ([Publication](https://dl.acm.org/doi/abs/10.1145/3664647.3681071), [Paper](https://openreview.net/pdf?id=NMMyGy1kKZ), [Code](https://github.com/linhuixiao/HiVG)) has been accepted by the top conference ACM MM 2024 !**
|
| 72 |
- **Update on 2023/9/25:** **Our grounding work CLIP-VG ([paper](https://ieeexplore.ieee.org/abstract/document/10269126), [Code](https://github.com/linhuixiao/CLIP-VG)) has been accepted by the top journal IEEE Transaction on Multimedia (2023)!**
|
| 73 |
|
|
@@ -193,7 +193,7 @@ Finally, the `$/path_to_image_data` folder will have the following structure:
|
|
| 193 |
The labels in the fully supervised scenario is consistent with previous works such as [CLIP-VG](https://github.com/linhuixiao/CLIP-VG).
|
| 194 |
|
| 195 |
:star: As we need to conduct pre-training with mixed datasets, we have shuffled the order of the datasets and unified
|
| 196 |
-
some of the dataset formats. You need to download our text annotation files from the [HuggingFace homepage](https://huggingface.co/
|
| 197 |
|
| 198 |
### Fully supervised setting
|
| 199 |
<table>
|
|
@@ -210,7 +210,7 @@ some of the dataset formats. You need to download our text annotation files from
|
|
| 210 |
</tr>
|
| 211 |
<tr> <!-- line 2 -->
|
| 212 |
<th style="text-align:center" rowspan="1"> url, size </th> <!-- table head -->
|
| 213 |
-
<th style="text-align:center" colspan="8"> <a href="https://huggingface.co/
|
| 214 |
</tr>
|
| 215 |
</table>
|
| 216 |
|
|
@@ -274,8 +274,8 @@ the results or encounter errors, please contact us promptly via email or by rais
|
|
| 274 |
We will check and upload the correct models. This might be due to model upload errors or model corruption
|
| 275 |
during disk storage. After all, we trained nearly a hundred models during the research course of this work.**
|
| 276 |
|
| 277 |
-
<a href="https://huggingface.co/
|
| 278 |
-
All the models are publicly available on the [**OneRef Huggingface homepage**](https://huggingface.co/
|
| 279 |
|
| 280 |
### REC task: Single-dataset fine-tuning checkpoints download
|
| 281 |
|
|
@@ -294,15 +294,15 @@ All the models are publicly available on the [**OneRef Huggingface homepage**](h
|
|
| 294 |
</tr>
|
| 295 |
<tr> <!-- line 2 -->
|
| 296 |
<th style="text-align:center" rowspan="1"> Base model </th> <!-- table head -->
|
| 297 |
-
<th style="text-align:center" colspan="6"> <a href="https://huggingface.co/
|
| 298 |
</tr>
|
| 299 |
<tr> <!-- line 2 -->
|
| 300 |
<th style="text-align:center" rowspan="1"> Large model </th> <!-- table head -->
|
| 301 |
-
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/
|
| 302 |
-
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/
|
| 303 |
-
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/
|
| 304 |
-
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/
|
| 305 |
-
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/
|
| 306 |
</tr>
|
| 307 |
</table>
|
| 308 |
|
|
@@ -319,14 +319,14 @@ All the models are publicly available on the [**OneRef Huggingface homepage**](h
|
|
| 319 |
</tr>
|
| 320 |
<tr> <!-- line 2 -->
|
| 321 |
<th style="text-align:center" rowspan="1"> base model </th> <!-- table head -->
|
| 322 |
-
<th style="text-align:center" colspan="3"> <a href="https://huggingface.co/
|
| 323 |
</tr>
|
| 324 |
<tr> <!-- line 3 -->
|
| 325 |
<th style="text-align:center" > Large model </th>
|
| 326 |
-
<th style="text-align:center" > <a href="https://huggingface.co/
|
| 327 |
-
<th style="text-align:center" > <a href="https://huggingface.co/
|
| 328 |
-
<th style="text-align:center" > <a href="https://huggingface.co/
|
| 329 |
-
</
|
| 330 |
</table>
|
| 331 |
|
| 332 |
|
|
@@ -339,7 +339,7 @@ All the models are publicly available on the [**OneRef Huggingface homepage**](h
|
|
| 339 |
</tr>
|
| 340 |
<tr> <!-- line 2 -->
|
| 341 |
<th style="text-align:center" rowspan="1"> base model </th> <!-- table head -->
|
| 342 |
-
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/
|
| 343 |
</tr>
|
| 344 |
<tr> <!-- line 3 -->
|
| 345 |
<th style="text-align:center" > Large model </th>
|
|
@@ -359,13 +359,13 @@ All the models are publicly available on the [**OneRef Huggingface homepage**](h
|
|
| 359 |
</tr>
|
| 360 |
<tr> <!-- line 2 -->
|
| 361 |
<th style="text-align:center" rowspan="1"> base model </th> <!-- table head -->
|
| 362 |
-
<th style="text-align:center" colspan="3"> <a href="https://huggingface.co/
|
| 363 |
</tr>
|
| 364 |
<tr> <!-- line 2 -->
|
| 365 |
<th style="text-align:center" rowspan="1"> Large model </th> <!-- table head -->
|
| 366 |
-
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/
|
| 367 |
-
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/
|
| 368 |
-
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/
|
| 369 |
</tr>
|
| 370 |
</table>
|
| 371 |
|
|
@@ -380,11 +380,11 @@ All the models are publicly available on the [**OneRef Huggingface homepage**](h
|
|
| 380 |
</tr>
|
| 381 |
<tr> <!-- line 2 -->
|
| 382 |
<th style="text-align:center" rowspan="1"> base model </th> <!-- table head -->
|
| 383 |
-
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/
|
| 384 |
</tr>
|
| 385 |
<tr> <!-- line 3 -->
|
| 386 |
<th style="text-align:center" > Large model </th>
|
| 387 |
-
<th style="text-align:center" > <a href="https://huggingface.co/
|
| 388 |
</tr>
|
| 389 |
</table>
|
| 390 |
|
|
@@ -423,7 +423,7 @@ the five datasets at once and just using a single script.
|
|
| 423 |
the MRefM pre-training **for the RES task** is mainly carried out through a mixture of the RefC datasets.
|
| 424 |
|
| 425 |
For MRefM pre-training, the base model took 15 hours on 32 NVIDIA A100 GPUs, while the large model took 50 hours on
|
| 426 |
-
the same number of GPUs. We provide the MRefM pre-trained checkpoints at the following: All model are placed in [HuggingFace Page](https://huggingface.co/
|
| 427 |
|
| 428 |
|
| 429 |
<table>
|
|
@@ -435,12 +435,12 @@ the same number of GPUs. We provide the MRefM pre-trained checkpoints at the fol
|
|
| 435 |
<tr> <!-- line 2 -->
|
| 436 |
<th style="text-align:center" rowspan="1"> Base model </th> <!-- table head -->
|
| 437 |
<th style="text-align:center" rowspan="1"> RefC,ReferIt </th> <!-- table head -->
|
| 438 |
-
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/
|
| 439 |
</tr>
|
| 440 |
<tr> <!-- line 3 -->
|
| 441 |
<th style="text-align:center" > Large model </th>
|
| 442 |
<th style="text-align:center" rowspan="1"> RefC,ReferIt </th> <!-- table head -->
|
| 443 |
-
<th style="text-align:center" > <a href="https://huggingface.co/
|
| 444 |
</tr>
|
| 445 |
</table>
|
| 446 |
|
|
@@ -455,12 +455,12 @@ the same number of GPUs. We provide the MRefM pre-trained checkpoints at the fol
|
|
| 455 |
<tr> <!-- line 2 -->
|
| 456 |
<th style="text-align:center" rowspan="1"> Base model </th> <!-- table head -->
|
| 457 |
<th style="text-align:center" > RefC </th>
|
| 458 |
-
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/
|
| 459 |
</tr>
|
| 460 |
<tr> <!-- line 3 -->
|
| 461 |
<th style="text-align:center" > Large model </th>
|
| 462 |
<th style="text-align:center" > RefC </th>
|
| 463 |
-
<th style="text-align:center" > <a href="https://huggingface.co/
|
| 464 |
</tr>
|
| 465 |
</table>
|
| 466 |
|
|
@@ -479,19 +479,19 @@ the [BEiT-3 official repository](https://github.com/microsoft/unilm/tree/master/
|
|
| 479 |
</tr>
|
| 480 |
<tr> <!-- line 2 -->
|
| 481 |
<th style="text-align:center" rowspan="1"> Sentencepiece model (Tokenizer) </th> <!-- table head -->
|
| 482 |
-
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/
|
| 483 |
</tr>
|
| 484 |
<tr> <!-- line 2 -->
|
| 485 |
<th style="text-align:center" rowspan="1"> MIM VQKD model </th> <!-- table head -->
|
| 486 |
-
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/
|
| 487 |
</tr>
|
| 488 |
<tr> <!-- line 2 -->
|
| 489 |
<th style="text-align:center" rowspan="1"> BEiT-3 Base model </th> <!-- table head -->
|
| 490 |
-
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/
|
| 491 |
</tr>
|
| 492 |
<tr> <!-- line 3 -->
|
| 493 |
<th style="text-align:center" > BEiT-3 Large model </th>
|
| 494 |
-
<th style="text-align:center" > <a href="https://huggingface.co/
|
| 495 |
</tr>
|
| 496 |
</table>
|
| 497 |
|
|
|
|
| 50 |
This repository is the official Pytorch implementation for the paper [**OneRef: Unified One-tower Expression Grounding
|
| 51 |
and Segmentation with Mask Referring Modeling**](https://openreview.net/pdf?id=siPdcro6uD)
|
| 52 |
([Publication](https://proceedings.neurips.cc/paper_files/paper/2024/file/fcd812a51b8f8d05cfea22e3c9c4b369-Paper-Conference.pdf),
|
| 53 |
+
[Github Code](https://github.com/linhuixiao/OneRef), [HuggingFace model](https://huggingface.co/linhuixiao/OneRef)), which is an advanced version
|
| 54 |
of our preliminary work **HiVG** ([Publication](https://dl.acm.org/doi/abs/10.1145/3664647.3681071), [Paper](https://openreview.net/pdf?id=NMMyGy1kKZ),
|
| 55 |
[Code](https://github.com/linhuixiao/HiVG)) and **CLIP-VG** ([Publication](https://ieeexplore.ieee.org/abstract/document/10269126),
|
| 56 |
[Paper](https://arxiv.org/pdf/2305.08685), [Code](https://github.com/linhuixiao/CLIP-VG)).
|
|
|
|
| 67 |
:exclamation: During the code tidying process, some bugs may arise due to changes in variable names. If any issues occur, please raise them in the [issue page](https://github.com/linhuixiao/OneRef/issues), and I will try to resolve them timely.
|
| 68 |
|
| 69 |
- :fire: **Update on 2024/12/28: We conducted a Survey of Visual Grounding over the past decade, entitled "Towards Visual Grounding: A Survey" ([Paper](https://arxiv.org/pdf/2412.20206), [Project](https://github.com/linhuixiao/Awesome-Visual-Grounding)), Comments are welcome !!!**
|
| 70 |
+
- :fire: **Update on 2024/10/10: Our grounding work **OneRef** ([Paper](https://arxiv.org/abs/2410.08021), [Code](https://github.com/linhuixiao/OneRef), [Model](https://huggingface.co/linhuixiao/OneRef)) has been accepted by the top conference NeurIPS 2024 !**
|
| 71 |
- **Update on 2024/07/16:** **Our grounding work HiVG ([Publication](https://dl.acm.org/doi/abs/10.1145/3664647.3681071), [Paper](https://openreview.net/pdf?id=NMMyGy1kKZ), [Code](https://github.com/linhuixiao/HiVG)) has been accepted by the top conference ACM MM 2024 !**
|
| 72 |
- **Update on 2023/9/25:** **Our grounding work CLIP-VG ([paper](https://ieeexplore.ieee.org/abstract/document/10269126), [Code](https://github.com/linhuixiao/CLIP-VG)) has been accepted by the top journal IEEE Transaction on Multimedia (2023)!**
|
| 73 |
|
|
|
|
| 193 |
The labels in the fully supervised scenario is consistent with previous works such as [CLIP-VG](https://github.com/linhuixiao/CLIP-VG).
|
| 194 |
|
| 195 |
:star: As we need to conduct pre-training with mixed datasets, we have shuffled the order of the datasets and unified
|
| 196 |
+
some of the dataset formats. You need to download our text annotation files from the [HuggingFace homepage](https://huggingface.co/linhuixiao/OneRef/tree/main/text_box_annotation).
|
| 197 |
|
| 198 |
### Fully supervised setting
|
| 199 |
<table>
|
|
|
|
| 210 |
</tr>
|
| 211 |
<tr> <!-- line 2 -->
|
| 212 |
<th style="text-align:center" rowspan="1"> url, size </th> <!-- table head -->
|
| 213 |
+
<th style="text-align:center" colspan="8"> <a href="https://huggingface.co/linhuixiao/OneRef/tree/main/text_box_annotation">All of six datasets</a>, ~400.0MB </th> <!-- table head -->
|
| 214 |
</tr>
|
| 215 |
</table>
|
| 216 |
|
|
|
|
| 274 |
We will check and upload the correct models. This might be due to model upload errors or model corruption
|
| 275 |
during disk storage. After all, we trained nearly a hundred models during the research course of this work.**
|
| 276 |
|
| 277 |
+
<a href="https://huggingface.co/linhuixiao/OneRef/tree/main"><picture><source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/lobehub/lobe-icons/refs/heads/master/packages/static-png/dark/huggingface-color.png" /><img height="36px" width="36px" src="https://raw.githubusercontent.com/lobehub/lobe-icons/refs/heads/master/packages/static-png/light/huggingface-color.png" /></picture><br/>HuggingFace:
|
| 278 |
+
All the models are publicly available on the [**OneRef Huggingface homepage**](https://huggingface.co/linhuixiao/OneRef/tree/main). You can freely download the corresponding models on this website.
|
| 279 |
|
| 280 |
### REC task: Single-dataset fine-tuning checkpoints download
|
| 281 |
|
|
|
|
| 294 |
</tr>
|
| 295 |
<tr> <!-- line 2 -->
|
| 296 |
<th style="text-align:center" rowspan="1"> Base model </th> <!-- table head -->
|
| 297 |
+
<th style="text-align:center" colspan="6"> <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/rec_single_dataset_finetuning_base.zip"> Hugging Face, rec_single_dataset_finetuning_base.zip (for all), ~9.0 GB </a> </th> <!-- table head -->
|
| 298 |
</tr>
|
| 299 |
<tr> <!-- line 2 -->
|
| 300 |
<th style="text-align:center" rowspan="1"> Large model </th> <!-- table head -->
|
| 301 |
+
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/rec_single_dataset_finetuning_large_unc.pth">finetuning_large_unc, ~8.0 GB </a> </th> <!-- table head -->
|
| 302 |
+
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/rec_single_dataset_finetuning_large_unc%2B.pth">finetuning_large_unc+, ~8.0 GB </a> </th> <!-- table head -->
|
| 303 |
+
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/rec_single_dataset_finetuning_large_gref_umd.pth">finetuning_large_gref_umd, ~8.0 GB </a> </th> <!-- table head -->
|
| 304 |
+
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/rec_single_dataset_finetuning_large_referit.pth">finetuning_large_referit, ~8.0 GB </a> </th> <!-- table head -->
|
| 305 |
+
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/rec_single_dataset_finetuning_large_flickr.pth">finetuning_large_flickr, ~8.0 GB </a> </th> <!-- table head -->
|
| 306 |
</tr>
|
| 307 |
</table>
|
| 308 |
|
|
|
|
| 319 |
</tr>
|
| 320 |
<tr> <!-- line 2 -->
|
| 321 |
<th style="text-align:center" rowspan="1"> base model </th> <!-- table head -->
|
| 322 |
+
<th style="text-align:center" colspan="3"> <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/rec_mixup_grounding_pretraining_base.zip">rec_mixup_grounding_pretraining_base.zip, ~6.0 GB </a> </th> <!-- table head -->
|
| 323 |
</tr>
|
| 324 |
<tr> <!-- line 3 -->
|
| 325 |
<th style="text-align:center" > Large model </th>
|
| 326 |
+
<th style="text-align:center" > <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/rec_mixup_grounding_pretraining_large_unc%2Bg.pth">mixup_pretraining_large_unc+g, ~8.0 GB</a> </th>
|
| 327 |
+
<th style="text-align:center" > <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/rec_mixup_grounding_pretraining_large_referit.pth">mixup_pretraining_large_referit, ~8.0 GB</a> </th>
|
| 328 |
+
<th style="text-align:center" > <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/rec_mixup_grounding_pretraining_large_flickr.pth">mixup_pretraining_large_flickr, ~8.0 GB</a> </th>
|
| 329 |
+
</trlinhuixiao
|
| 330 |
</table>
|
| 331 |
|
| 332 |
|
|
|
|
| 339 |
</tr>
|
| 340 |
<tr> <!-- line 2 -->
|
| 341 |
<th style="text-align:center" rowspan="1"> base model </th> <!-- table head -->
|
| 342 |
+
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/rec_mixup_grounding_ultimate_performance_base_in_the_survey.zip">rec_mixup_grounding_ultimate_performance_base.zip, ~6.0 GB </a> </th> <!-- table head -->
|
| 343 |
</tr>
|
| 344 |
<tr> <!-- line 3 -->
|
| 345 |
<th style="text-align:center" > Large model </th>
|
|
|
|
| 359 |
</tr>
|
| 360 |
<tr> <!-- line 2 -->
|
| 361 |
<th style="text-align:center" rowspan="1"> base model </th> <!-- table head -->
|
| 362 |
+
<th style="text-align:center" colspan="3"> <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/res_single_dataset_finetuning_base.zip"> res_single_dataset_finetuning_base.zip, ~6.0 GB </a> </th> <!-- table head -->
|
| 363 |
</tr>
|
| 364 |
<tr> <!-- line 2 -->
|
| 365 |
<th style="text-align:center" rowspan="1"> Large model </th> <!-- table head -->
|
| 366 |
+
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/res_single_dataset_finetuning_large_unc.pth">finetuning_large_unc, ~8.0 GB </a> </th> <!-- table head -->
|
| 367 |
+
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/res_single_dataset_finetuning_large_unc%2B.pth">finetuning_large_unc+, ~8.0 GB </a> </th> <!-- table head -->
|
| 368 |
+
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/res_single_dataset_finetuning_large_gref_umd.pth">finetuning_large_gref_umd, ~8.0 GB </a> </th> <!-- table head -->
|
| 369 |
</tr>
|
| 370 |
</table>
|
| 371 |
|
|
|
|
| 380 |
</tr>
|
| 381 |
<tr> <!-- line 2 -->
|
| 382 |
<th style="text-align:center" rowspan="1"> base model </th> <!-- table head -->
|
| 383 |
+
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/res_mixup_grounding_pretraining_base.zip">res_mixup_pretraining_base.zip, ~1.0 GB </a> </th> <!-- table head -->
|
| 384 |
</tr>
|
| 385 |
<tr> <!-- line 3 -->
|
| 386 |
<th style="text-align:center" > Large model </th>
|
| 387 |
+
<th style="text-align:center" > <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/res_mixup_grounding_pretraining_large_unc_%2B_g.pth">res_mixup_pretraining_large, ~2.0 GB</a> </th>
|
| 388 |
</tr>
|
| 389 |
</table>
|
| 390 |
|
|
|
|
| 423 |
the MRefM pre-training **for the RES task** is mainly carried out through a mixture of the RefC datasets.
|
| 424 |
|
| 425 |
For MRefM pre-training, the base model took 15 hours on 32 NVIDIA A100 GPUs, while the large model took 50 hours on
|
| 426 |
+
the same number of GPUs. We provide the MRefM pre-trained checkpoints at the following: All model are placed in [HuggingFace Page](https://huggingface.co/linhuixiao/OneRef/tree/main)
|
| 427 |
|
| 428 |
|
| 429 |
<table>
|
|
|
|
| 435 |
<tr> <!-- line 2 -->
|
| 436 |
<th style="text-align:center" rowspan="1"> Base model </th> <!-- table head -->
|
| 437 |
<th style="text-align:center" rowspan="1"> RefC,ReferIt </th> <!-- table head -->
|
| 438 |
+
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/mrefm_pretrain_patch16_384/rec_mrefm_pretrain_base_patch16_384.pth">rec_mrefm_base_patch16_384, ~2 GB </a> </th> <!-- table head -->
|
| 439 |
</tr>
|
| 440 |
<tr> <!-- line 3 -->
|
| 441 |
<th style="text-align:center" > Large model </th>
|
| 442 |
<th style="text-align:center" rowspan="1"> RefC,ReferIt </th> <!-- table head -->
|
| 443 |
+
<th style="text-align:center" > <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/mrefm_pretrain_patch16_384/rec_mrefm_pretrain_large_patch16_384.pth">rec_mrefm_large_patch16_384, ~7 GB</a> </th>
|
| 444 |
</tr>
|
| 445 |
</table>
|
| 446 |
|
|
|
|
| 455 |
<tr> <!-- line 2 -->
|
| 456 |
<th style="text-align:center" rowspan="1"> Base model </th> <!-- table head -->
|
| 457 |
<th style="text-align:center" > RefC </th>
|
| 458 |
+
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/mrefm_pretrain_patch16_384/res_mrefm_pretrain_base_patch16_384.pth">res_mrefm_base_patch16_384, ~2 GB </a> </th> <!-- table head -->
|
| 459 |
</tr>
|
| 460 |
<tr> <!-- line 3 -->
|
| 461 |
<th style="text-align:center" > Large model </th>
|
| 462 |
<th style="text-align:center" > RefC </th>
|
| 463 |
+
<th style="text-align:center" > <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/mrefm_pretrain_patch16_384/res_mrefm_pretrain_large_patch16_384.pth">res_mrefm_base_patch16_384, ~7 GB</a> </th>
|
| 464 |
</tr>
|
| 465 |
</table>
|
| 466 |
|
|
|
|
| 479 |
</tr>
|
| 480 |
<tr> <!-- line 2 -->
|
| 481 |
<th style="text-align:center" rowspan="1"> Sentencepiece model (Tokenizer) </th> <!-- table head -->
|
| 482 |
+
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/beit3_checkpoints/beit3.spm">sp3 Sentencepiece model, 1 MB </a> </th> <!-- table head -->
|
| 483 |
</tr>
|
| 484 |
<tr> <!-- line 2 -->
|
| 485 |
<th style="text-align:center" rowspan="1"> MIM VQKD model </th> <!-- table head -->
|
| 486 |
+
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/beit3_checkpoints/vqkd_encoder_base_decoder_3x768x12_clip-d5036aa7.pth">vqkd model, 438 MB </a> </th> <!-- table head -->
|
| 487 |
</tr>
|
| 488 |
<tr> <!-- line 2 -->
|
| 489 |
<th style="text-align:center" rowspan="1"> BEiT-3 Base model </th> <!-- table head -->
|
| 490 |
+
<th style="text-align:center" colspan="1"> <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/beit3_checkpoints/beit3_base_indomain_patch16_224.pth">beit3_base_indomain_patch16_224, 554 MB </a> </th> <!-- table head -->
|
| 491 |
</tr>
|
| 492 |
<tr> <!-- line 3 -->
|
| 493 |
<th style="text-align:center" > BEiT-3 Large model </th>
|
| 494 |
+
<th style="text-align:center" > <a href="https://huggingface.co/linhuixiao/OneRef/blob/main/beit3_checkpoints/beit3_large_indomain_patch16_224.pth">beit3_large_indomain_patch16_224, 1.5 GB</a> </th>
|
| 495 |
</tr>
|
| 496 |
</table>
|
| 497 |
|