Add link to paper and Github repo
#1
by
nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,6 +1,12 @@
|
|
| 1 |
---
|
| 2 |
-
library_name: transformers
|
| 3 |
base_model: Salesforce/blip2-opt-2.7b
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
tags:
|
| 5 |
- vision-language
|
| 6 |
- multimodal
|
|
@@ -10,12 +16,6 @@ tags:
|
|
| 10 |
- photography
|
| 11 |
- image-captioning
|
| 12 |
- scene-analysis
|
| 13 |
-
license: mit
|
| 14 |
-
datasets:
|
| 15 |
-
- Dataseeds/DataSeeds-Sample-Dataset-DSD
|
| 16 |
-
language:
|
| 17 |
-
- en
|
| 18 |
-
pipeline_tag: image-to-text
|
| 19 |
model-index:
|
| 20 |
- name: BLIP2-OPT-2.7b-DSD-FineTune
|
| 21 |
results:
|
|
@@ -23,8 +23,8 @@ model-index:
|
|
| 23 |
type: image-captioning
|
| 24 |
name: Image Captioning
|
| 25 |
dataset:
|
| 26 |
-
type: Dataseeds/DataSeeds-Sample-Dataset-DSD
|
| 27 |
name: DataSeeds.AI Sample Dataset
|
|
|
|
| 28 |
metrics:
|
| 29 |
- type: bleu-4
|
| 30 |
value: 0.047
|
|
@@ -42,8 +42,12 @@ model-index:
|
|
| 42 |
|
| 43 |
# BLIP2-OPT-2.7B Fine-tuned on DataSeeds.AI Dataset
|
| 44 |
|
|
|
|
|
|
|
| 45 |
This model is a fine-tuned version of [Salesforce/blip2-opt-2.7b](https://huggingface.co/Salesforce/blip2-opt-2.7b) specialized for photography scene analysis and technical description generation. The model was fine-tuned on the [DataSeeds.AI Sample Dataset (DSD)](https://huggingface.co/datasets/Dataseeds/DataSeeds-Sample-Dataset-DSD) to enhance its capabilities in generating detailed photographic descriptions with focus on composition, lighting, and technical aspects.
|
| 46 |
|
|
|
|
|
|
|
| 47 |
## Model Description
|
| 48 |
|
| 49 |
- **Base Model**: [BLIP2-OPT-2.7B](https://huggingface.co/Salesforce/blip2-opt-2.7b)
|
|
@@ -176,7 +180,8 @@ The model maintains the BLIP-2 architecture with the following components:
|
|
| 176 |
### Core Architecture
|
| 177 |
- **Vision Encoder**: EVA-CLIP ViT-g/14 (unfrozen during fine-tuning)
|
| 178 |
- **Q-Former**: 32-layer transformer bridging vision and language modalities
|
| 179 |
-
- **Language Model**: OPT-2.7B
|
|
|
|
| 180 |
- **Bootstrapping**: Two-stage pre-training methodology preserved
|
| 181 |
|
| 182 |
### Technical Specifications
|
|
@@ -277,4 +282,4 @@ This repository includes comprehensive training artifacts:
|
|
| 277 |
|
| 278 |
---
|
| 279 |
|
| 280 |
-
*For questions, issues, or collaboration opportunities, please visit the [model repository](https://huggingface.co/Dataseeds/BLIP2-opt-2.7b-DSD-FineTune) or contact the DataSeeds.AI team.*
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
base_model: Salesforce/blip2-opt-2.7b
|
| 3 |
+
datasets:
|
| 4 |
+
- Dataseeds/DataSeeds-Sample-Dataset-DSD
|
| 5 |
+
language:
|
| 6 |
+
- en
|
| 7 |
+
library_name: transformers
|
| 8 |
+
license: mit
|
| 9 |
+
pipeline_tag: image-to-text
|
| 10 |
tags:
|
| 11 |
- vision-language
|
| 12 |
- multimodal
|
|
|
|
| 16 |
- photography
|
| 17 |
- image-captioning
|
| 18 |
- scene-analysis
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
model-index:
|
| 20 |
- name: BLIP2-OPT-2.7b-DSD-FineTune
|
| 21 |
results:
|
|
|
|
| 23 |
type: image-captioning
|
| 24 |
name: Image Captioning
|
| 25 |
dataset:
|
|
|
|
| 26 |
name: DataSeeds.AI Sample Dataset
|
| 27 |
+
type: Dataseeds/DataSeeds-Sample-Dataset-DSD
|
| 28 |
metrics:
|
| 29 |
- type: bleu-4
|
| 30 |
value: 0.047
|
|
|
|
| 42 |
|
| 43 |
# BLIP2-OPT-2.7B Fine-tuned on DataSeeds.AI Dataset
|
| 44 |
|
| 45 |
+
Code: https://github.com/DataSeedAI/Sample-DSD-Finetune
|
| 46 |
+
|
| 47 |
This model is a fine-tuned version of [Salesforce/blip2-opt-2.7b](https://huggingface.co/Salesforce/blip2-opt-2.7b) specialized for photography scene analysis and technical description generation. The model was fine-tuned on the [DataSeeds.AI Sample Dataset (DSD)](https://huggingface.co/datasets/Dataseeds/DataSeeds-Sample-Dataset-DSD) to enhance its capabilities in generating detailed photographic descriptions with focus on composition, lighting, and technical aspects.
|
| 48 |
|
| 49 |
+
The model was presented in the paper [Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds' Annotated Imagery](https://huggingface.co/papers/2506.05673).
|
| 50 |
+
|
| 51 |
## Model Description
|
| 52 |
|
| 53 |
- **Base Model**: [BLIP2-OPT-2.7B](https://huggingface.co/Salesforce/blip2-opt-2.7b)
|
|
|
|
| 180 |
### Core Architecture
|
| 181 |
- **Vision Encoder**: EVA-CLIP ViT-g/14 (unfrozen during fine-tuning)
|
| 182 |
- **Q-Former**: 32-layer transformer bridging vision and language modalities
|
| 183 |
+
- **Language Model**: OPT-2.7B (2.7 billion parameters)
|
| 184 |
+
- **Architecture**: BLIP-2 with Q-Former bridging vision and language
|
| 185 |
- **Bootstrapping**: Two-stage pre-training methodology preserved
|
| 186 |
|
| 187 |
### Technical Specifications
|
|
|
|
| 282 |
|
| 283 |
---
|
| 284 |
|
| 285 |
+
*For questions, issues, or collaboration opportunities, please visit the [model repository](https://huggingface.co/Dataseeds/BLIP2-opt-2.7b-DSD-FineTune) or contact the DataSeeds.AI team.*
|