Dataseeds
/

BLIP2-opt-2.7b-DSD-FineTune

@@ -1,6 +1,12 @@
 ---
-library_name: transformers
 base_model: Salesforce/blip2-opt-2.7b
 tags:
 - vision-language
 - multimodal
@@ -10,12 +16,6 @@ tags:
 - photography
 - image-captioning
 - scene-analysis
-license: mit
-datasets:
-- Dataseeds/DataSeeds-Sample-Dataset-DSD
-language:
-- en
-pipeline_tag: image-to-text
 model-index:
 - name: BLIP2-OPT-2.7b-DSD-FineTune
   results:
@@ -23,8 +23,8 @@ model-index:
       type: image-captioning
       name: Image Captioning
     dataset:
-      type: Dataseeds/DataSeeds-Sample-Dataset-DSD
       name: DataSeeds.AI Sample Dataset
     metrics:
     - type: bleu-4
       value: 0.047
@@ -42,8 +42,12 @@ model-index:
 # BLIP2-OPT-2.7B Fine-tuned on DataSeeds.AI Dataset
 This model is a fine-tuned version of [Salesforce/blip2-opt-2.7b](https://huggingface.co/Salesforce/blip2-opt-2.7b) specialized for photography scene analysis and technical description generation. The model was fine-tuned on the [DataSeeds.AI Sample Dataset (DSD)](https://huggingface.co/datasets/Dataseeds/DataSeeds-Sample-Dataset-DSD) to enhance its capabilities in generating detailed photographic descriptions with focus on composition, lighting, and technical aspects.
 ## Model Description
 - **Base Model**: [BLIP2-OPT-2.7B](https://huggingface.co/Salesforce/blip2-opt-2.7b)
@@ -176,7 +180,8 @@ The model maintains the BLIP-2 architecture with the following components:
 ### Core Architecture
 - **Vision Encoder**: EVA-CLIP ViT-g/14 (unfrozen during fine-tuning)
 - **Q-Former**: 32-layer transformer bridging vision and language modalities
-- **Language Model**: OPT-2.7B decoder-only transformer
 - **Bootstrapping**: Two-stage pre-training methodology preserved
 ### Technical Specifications
@@ -277,4 +282,4 @@ This repository includes comprehensive training artifacts:
 ---
-*For questions, issues, or collaboration opportunities, please visit the [model repository](https://huggingface.co/Dataseeds/BLIP2-opt-2.7b-DSD-FineTune) or contact the DataSeeds.AI team.*

 ---
 base_model: Salesforce/blip2-opt-2.7b
+datasets:
+- Dataseeds/DataSeeds-Sample-Dataset-DSD
+language:
+- en
+library_name: transformers
+license: mit
+pipeline_tag: image-to-text
 tags:
 - vision-language
 - multimodal
 - photography
 - image-captioning
 - scene-analysis
 model-index:
 - name: BLIP2-OPT-2.7b-DSD-FineTune
   results:
       type: image-captioning
       name: Image Captioning
     dataset:
       name: DataSeeds.AI Sample Dataset
+      type: Dataseeds/DataSeeds-Sample-Dataset-DSD
     metrics:
     - type: bleu-4
       value: 0.047
 # BLIP2-OPT-2.7B Fine-tuned on DataSeeds.AI Dataset
+Code: https://github.com/DataSeedAI/Sample-DSD-Finetune
 This model is a fine-tuned version of [Salesforce/blip2-opt-2.7b](https://huggingface.co/Salesforce/blip2-opt-2.7b) specialized for photography scene analysis and technical description generation. The model was fine-tuned on the [DataSeeds.AI Sample Dataset (DSD)](https://huggingface.co/datasets/Dataseeds/DataSeeds-Sample-Dataset-DSD) to enhance its capabilities in generating detailed photographic descriptions with focus on composition, lighting, and technical aspects.
+The model was presented in the paper [Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds' Annotated Imagery](https://huggingface.co/papers/2506.05673).
 ## Model Description
 - **Base Model**: [BLIP2-OPT-2.7B](https://huggingface.co/Salesforce/blip2-opt-2.7b)
 ### Core Architecture
 - **Vision Encoder**: EVA-CLIP ViT-g/14 (unfrozen during fine-tuning)
 - **Q-Former**: 32-layer transformer bridging vision and language modalities
+- **Language Model**: OPT-2.7B (2.7 billion parameters)
+- **Architecture**: BLIP-2 with Q-Former bridging vision and language
 - **Bootstrapping**: Two-stage pre-training methodology preserved
 ### Technical Specifications
 ---
+*For questions, issues, or collaboration opportunities, please visit the [model repository](https://huggingface.co/Dataseeds/BLIP2-opt-2.7b-DSD-FineTune) or contact the DataSeeds.AI team.*