Amirhossein75 commited on
Commit
edaa030
·
verified ·
1 Parent(s): c5483dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -7
README.md CHANGED
@@ -33,20 +33,17 @@ This repository provides a clean, reproducible **training recipe** to fine‑tun
33
  ### Model Description
34
  <!-- Provide a longer summary of what this model is. -->
35
  - **Developed by:** Amirhossein Yousefi (repo maintainer)
36
- - **Funded by [optional]:** Not specified
37
- - **Shared by [optional]:** Public, open-source repository
38
  - **Model type:** **Dual‑encoder** (vision transformer + text transformer) trained with **contrastive objectives** (CLIP softmax contrastive loss or SigLIP sigmoid loss)
39
  - **Language(s) (NLP):** English captions (Flickr8k/Flickr30k)
40
  - **License:** *No explicit license file in the repo at authoring time; respect base model licenses.*
41
  - **Finetuned from model [optional]:** Typical backbones are `openai/clip-vit-base-patch16` and `google/siglip-base-patch16-224`
42
 
43
- ### Model Sources [optional]
44
  <!-- Provide the basic links for the model. -->
45
  - **Repository:** https://github.com/amirhossein-yousefi/Image-Contrastive-CLIP
46
- - **Paper [optional]:**
47
  - CLIP: Radford et al., 2021 – https://arxiv.org/abs/2103.00020
48
  - SigLIP: Zhai et al., 2023 – https://arxiv.org/abs/2303.15343
49
- - **Demo [optional]:** (add a Colab/Space link if you publish one)
50
 
51
  ## Uses
52
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
@@ -56,7 +53,7 @@ This repository provides a clean, reproducible **training recipe** to fine‑tun
56
  - **Task:** Image–text retrieval (image→text and text→image) on English-captioned datasets, using CLIP/SigLIP encoders fine‑tuned via this repo.
57
  - **Artifacts:** Training entrypoint (`src/main_training.py`), scripted evaluator (`src/evaluate_.py`), and index/metric utilities (`src/index_utils.py`, `src/retrieval_metrics.py`).
58
 
59
- ### Downstream Use [optional]
60
  <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
61
  - **Semantic search** over image collections (export embeddings and index with FAISS).
62
  - **Zero‑shot classification** via text prompts (CLIP‑style) as a quick sanity check.
@@ -124,7 +121,7 @@ The evaluator builds an index and writes retrieval metrics (R@1/5/10, MedR, and
124
 
125
  ### Training Procedure
126
 
127
- #### Preprocessing [optional]
128
  - Uses `AutoProcessor`/`image_processor` + tokenizer.
129
  - For **SigLIP**, text padding is set to `max_length`; **CLIP** can use dynamic padding.
130
  - **Random caption per image** is sampled per step to keep batches well‑mixed.
 
33
  ### Model Description
34
  <!-- Provide a longer summary of what this model is. -->
35
  - **Developed by:** Amirhossein Yousefi (repo maintainer)
 
 
36
  - **Model type:** **Dual‑encoder** (vision transformer + text transformer) trained with **contrastive objectives** (CLIP softmax contrastive loss or SigLIP sigmoid loss)
37
  - **Language(s) (NLP):** English captions (Flickr8k/Flickr30k)
38
  - **License:** *No explicit license file in the repo at authoring time; respect base model licenses.*
39
  - **Finetuned from model [optional]:** Typical backbones are `openai/clip-vit-base-patch16` and `google/siglip-base-patch16-224`
40
 
41
+ ### Model Sources
42
  <!-- Provide the basic links for the model. -->
43
  - **Repository:** https://github.com/amirhossein-yousefi/Image-Contrastive-CLIP
44
+ - **Paper :**
45
  - CLIP: Radford et al., 2021 – https://arxiv.org/abs/2103.00020
46
  - SigLIP: Zhai et al., 2023 – https://arxiv.org/abs/2303.15343
 
47
 
48
  ## Uses
49
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 
53
  - **Task:** Image–text retrieval (image→text and text→image) on English-captioned datasets, using CLIP/SigLIP encoders fine‑tuned via this repo.
54
  - **Artifacts:** Training entrypoint (`src/main_training.py`), scripted evaluator (`src/evaluate_.py`), and index/metric utilities (`src/index_utils.py`, `src/retrieval_metrics.py`).
55
 
56
+ ### Downstream Use
57
  <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
58
  - **Semantic search** over image collections (export embeddings and index with FAISS).
59
  - **Zero‑shot classification** via text prompts (CLIP‑style) as a quick sanity check.
 
121
 
122
  ### Training Procedure
123
 
124
+ #### Preprocessing
125
  - Uses `AutoProcessor`/`image_processor` + tokenizer.
126
  - For **SigLIP**, text padding is set to `max_length`; **CLIP** can use dynamic padding.
127
  - **Random caption per image** is sampled per step to keep batches well‑mixed.