Add link to paper and Github repo

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +15 -10
README.md CHANGED
@@ -1,6 +1,12 @@
1
  ---
2
- library_name: transformers
3
  base_model: Salesforce/blip2-opt-2.7b
 
 
 
 
 
 
 
4
  tags:
5
  - vision-language
6
  - multimodal
@@ -10,12 +16,6 @@ tags:
10
  - photography
11
  - image-captioning
12
  - scene-analysis
13
- license: mit
14
- datasets:
15
- - Dataseeds/DataSeeds-Sample-Dataset-DSD
16
- language:
17
- - en
18
- pipeline_tag: image-to-text
19
  model-index:
20
  - name: BLIP2-OPT-2.7b-DSD-FineTune
21
  results:
@@ -23,8 +23,8 @@ model-index:
23
  type: image-captioning
24
  name: Image Captioning
25
  dataset:
26
- type: Dataseeds/DataSeeds-Sample-Dataset-DSD
27
  name: DataSeeds.AI Sample Dataset
 
28
  metrics:
29
  - type: bleu-4
30
  value: 0.047
@@ -42,8 +42,12 @@ model-index:
42
 
43
  # BLIP2-OPT-2.7B Fine-tuned on DataSeeds.AI Dataset
44
 
 
 
45
  This model is a fine-tuned version of [Salesforce/blip2-opt-2.7b](https://huggingface.co/Salesforce/blip2-opt-2.7b) specialized for photography scene analysis and technical description generation. The model was fine-tuned on the [DataSeeds.AI Sample Dataset (DSD)](https://huggingface.co/datasets/Dataseeds/DataSeeds-Sample-Dataset-DSD) to enhance its capabilities in generating detailed photographic descriptions with focus on composition, lighting, and technical aspects.
46
 
 
 
47
  ## Model Description
48
 
49
  - **Base Model**: [BLIP2-OPT-2.7B](https://huggingface.co/Salesforce/blip2-opt-2.7b)
@@ -176,7 +180,8 @@ The model maintains the BLIP-2 architecture with the following components:
176
  ### Core Architecture
177
  - **Vision Encoder**: EVA-CLIP ViT-g/14 (unfrozen during fine-tuning)
178
  - **Q-Former**: 32-layer transformer bridging vision and language modalities
179
- - **Language Model**: OPT-2.7B decoder-only transformer
 
180
  - **Bootstrapping**: Two-stage pre-training methodology preserved
181
 
182
  ### Technical Specifications
@@ -277,4 +282,4 @@ This repository includes comprehensive training artifacts:
277
 
278
  ---
279
 
280
- *For questions, issues, or collaboration opportunities, please visit the [model repository](https://huggingface.co/Dataseeds/BLIP2-opt-2.7b-DSD-FineTune) or contact the DataSeeds.AI team.*
 
1
  ---
 
2
  base_model: Salesforce/blip2-opt-2.7b
3
+ datasets:
4
+ - Dataseeds/DataSeeds-Sample-Dataset-DSD
5
+ language:
6
+ - en
7
+ library_name: transformers
8
+ license: mit
9
+ pipeline_tag: image-to-text
10
  tags:
11
  - vision-language
12
  - multimodal
 
16
  - photography
17
  - image-captioning
18
  - scene-analysis
 
 
 
 
 
 
19
  model-index:
20
  - name: BLIP2-OPT-2.7b-DSD-FineTune
21
  results:
 
23
  type: image-captioning
24
  name: Image Captioning
25
  dataset:
 
26
  name: DataSeeds.AI Sample Dataset
27
+ type: Dataseeds/DataSeeds-Sample-Dataset-DSD
28
  metrics:
29
  - type: bleu-4
30
  value: 0.047
 
42
 
43
  # BLIP2-OPT-2.7B Fine-tuned on DataSeeds.AI Dataset
44
 
45
+ Code: https://github.com/DataSeedAI/Sample-DSD-Finetune
46
+
47
  This model is a fine-tuned version of [Salesforce/blip2-opt-2.7b](https://huggingface.co/Salesforce/blip2-opt-2.7b) specialized for photography scene analysis and technical description generation. The model was fine-tuned on the [DataSeeds.AI Sample Dataset (DSD)](https://huggingface.co/datasets/Dataseeds/DataSeeds-Sample-Dataset-DSD) to enhance its capabilities in generating detailed photographic descriptions with focus on composition, lighting, and technical aspects.
48
 
49
+ The model was presented in the paper [Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds' Annotated Imagery](https://huggingface.co/papers/2506.05673).
50
+
51
  ## Model Description
52
 
53
  - **Base Model**: [BLIP2-OPT-2.7B](https://huggingface.co/Salesforce/blip2-opt-2.7b)
 
180
  ### Core Architecture
181
  - **Vision Encoder**: EVA-CLIP ViT-g/14 (unfrozen during fine-tuning)
182
  - **Q-Former**: 32-layer transformer bridging vision and language modalities
183
+ - **Language Model**: OPT-2.7B (2.7 billion parameters)
184
+ - **Architecture**: BLIP-2 with Q-Former bridging vision and language
185
  - **Bootstrapping**: Two-stage pre-training methodology preserved
186
 
187
  ### Technical Specifications
 
282
 
283
  ---
284
 
285
+ *For questions, issues, or collaboration opportunities, please visit the [model repository](https://huggingface.co/Dataseeds/BLIP2-opt-2.7b-DSD-FineTune) or contact the DataSeeds.AI team.*