Commit ·
901bc96
1
Parent(s): ec67cfd
add zero dataset and achieve better result
Browse files- README.md +4 -4
- pytorch_model.bin +1 -1
README.md
CHANGED
|
@@ -15,7 +15,7 @@ tags:
|
|
| 15 |
|
| 16 |
# Model Details
|
| 17 |
|
| 18 |
-
This model is a Chinese CLIP model trained on [Noah-Wukong Dataset](https://wukong-dataset.github.io/wukong-dataset/)
|
| 19 |
|
| 20 |
# Taiyi (太乙)
|
| 21 |
Taiyi models are a branch of the Fengshenbang (封神榜) series of models. The models in Taiyi are pre-trained with multimodal pre-training strategies. We will release more image-text model trained on Chinese dataset and benefit the Chinese community.
|
|
@@ -65,14 +65,14 @@ with torch.no_grad():
|
|
| 65 |
### Zero-Shot Classification
|
| 66 |
| model | dataset | Top1 | Top5 |
|
| 67 |
| ---- | ---- | ---- | ---- |
|
| 68 |
-
| Taiyi-CLIP-Roberta-102M-Chinese | ImageNet1k-CN |
|
| 69 |
|
| 70 |
### Zero-Shot Text-to-Image Retrieval
|
| 71 |
|
| 72 |
| model | dataset | Top1 | Top5 | Top10 |
|
| 73 |
| ---- | ---- | ---- | ---- | ---- |
|
| 74 |
-
| Taiyi-CLIP-Roberta-102M-Chinese | Flickr30k-CNA-test |
|
| 75 |
-
| Taiyi-CLIP-Roberta-102M-Chinese | COCO-CN-test |
|
| 76 |
| Taiyi-CLIP-Roberta-102M-Chinese | wukong50k | 48.67% | 81.77% | 90.09% |
|
| 77 |
|
| 78 |
|
|
|
|
| 15 |
|
| 16 |
# Model Details
|
| 17 |
|
| 18 |
+
This model is a Chinese CLIP model trained on [Noah-Wukong Dataset(100M)](https://wukong-dataset.github.io/wukong-dataset/) and [Zero(23M)](https://zero.so.com/). We use ViT-B-32 from [openAI](https://github.com/openai/CLIP) as image encoder and Chinese pre-trained language model [chinese-roberta-wwm](https://huggingface.co/hfl/chinese-roberta-wwm-ext) as text encoder. We freeze the image encoder and only finetune the text encoder. The model was trained for 24 epochs and it takes about 10 days with 16 A100 GPUs.
|
| 19 |
|
| 20 |
# Taiyi (太乙)
|
| 21 |
Taiyi models are a branch of the Fengshenbang (封神榜) series of models. The models in Taiyi are pre-trained with multimodal pre-training strategies. We will release more image-text model trained on Chinese dataset and benefit the Chinese community.
|
|
|
|
| 65 |
### Zero-Shot Classification
|
| 66 |
| model | dataset | Top1 | Top5 |
|
| 67 |
| ---- | ---- | ---- | ---- |
|
| 68 |
+
| Taiyi-CLIP-Roberta-102M-Chinese | ImageNet1k-CN | 42.85% | 71.48% |
|
| 69 |
|
| 70 |
### Zero-Shot Text-to-Image Retrieval
|
| 71 |
|
| 72 |
| model | dataset | Top1 | Top5 | Top10 |
|
| 73 |
| ---- | ---- | ---- | ---- | ---- |
|
| 74 |
+
| Taiyi-CLIP-Roberta-102M-Chinese | Flickr30k-CNA-test | 46.32% | 74.58% | 83.44% |
|
| 75 |
+
| Taiyi-CLIP-Roberta-102M-Chinese | COCO-CN-test | 47.10% | 78.53% | 87.84% |
|
| 76 |
| Taiyi-CLIP-Roberta-102M-Chinese | wukong50k | 48.67% | 81.77% | 90.09% |
|
| 77 |
|
| 78 |
|
pytorch_model.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 410713709
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d679dcce5801d600bce716e1fa3e13508812b9cb4ff0ff6101d12a96b3a4eae9
|
| 3 |
size 410713709
|