metadata
license: apache-2.0
tags:
- generated-by-script
- peft
- image-captioning
base_model:
- microsoft/git-base
Model: ashimdahal/microsoft-git-base_microsoft-git-base
This repository contains model artifacts for a run named microsoft-git-base_microsoft-git-base, likely a PEFT adapter.
Training Source
This model was trained as part of the project/codebase available at: https://github.com/ashimdahal/captioning_image/blob/main
Base Model Information (Heuristic)
- Processor/Vision Encoder (Guessed):
microsoft/git-base - Decoder/Language Model (Guessed):
microsoft/git-base
⚠️ Important: The base_model tag in the metadata above is initially empty. The models listed here are heuristic guesses based on the training directory name (microsoft-git-base_microsoft-git-base). Please verify these against your training configuration and update the base_model: list in the YAML metadata block at the top of this README with the correct Hugging Face model identifiers.
How to Use (Example with PEFT)
from transformers import AutoProcessor, AutoModelForVision2Seq, Blip2ForConditionalGeneration # Or other relevant classes
from peft import PeftModel, PeftConfig
import torch
# --- Configuration ---
# 1. Specify the EXACT base model identifiers used during training
# base_processor_id = "microsoft/git-base" # <-- Replace with correct HF ID
# base_model_id = "microsoft/git-base" # <-- Replace with correct HF ID (e.g., Salesforce/blip2-opt-2.7b)
# 2. Specify the PEFT adapter repository ID (this repo)
# adapter_repo_id = "ashimdahal/microsoft-git-base_microsoft-git-base"
# --- Load Base Model and Processor ---
# processor = AutoProcessor.from_pretrained(base_processor_id)
# Load the base model (ensure it matches the type used for training)
# Example for BLIP-2 OPT:
# base_model = Blip2ForConditionalGeneration.from_pretrained(
# base_model_id,
# torch_dtype=torch.float16 # Or torch.bfloat16 or float32, match training/inference needs
# )
# Or for other model types:
# base_model = AutoModelForVision2Seq.from_pretrained(base_model_id, torch_dtype=torch.float16)
# --- Load PEFT Adapter ---
# Load the adapter config and merge the adapter weights into the base model
# model = PeftModel.from_pretrained(base_model, adapter_repo_id)
# model = model.merge_and_unload() # Merge weights for inference (optional but often recommended)
# model.eval() # Set model to evaluation mode
# --- Inference Example ---
# device = "cuda" if torch.cuda.is_available() else "cpu"
# model.to(device)
#
# image = ... # Load your image (e.g., using PIL)
# text = "a photo of" # Optional prompt start
#
# inputs = processor(images=image, text=text, return_tensors="pt").to(device, torch.float16) # Match model dtype
#
# generated_ids = model.generate(**inputs, max_new_tokens=50)
# generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
# print(f"Generated Caption: {{generated_text}}")
More model-specific documentation, evaluation results, and usage examples should be added here.