ashimdahal's picture
Add/Update generated README.md
5ee111e verified
|
raw
history blame
3.33 kB
metadata
license: apache-2.0
tags:
  - generated-by-script
  - peft
  - image-captioning
base_model:
  - microsoft/git-base

Model: ashimdahal/microsoft-git-base_microsoft-git-base

This repository contains model artifacts for a run named microsoft-git-base_microsoft-git-base, likely a PEFT adapter.

Training Source

This model was trained as part of the project/codebase available at: https://github.com/ashimdahal/captioning_image/blob/main

Base Model Information (Heuristic)

  • Processor/Vision Encoder (Guessed): microsoft/git-base
  • Decoder/Language Model (Guessed): microsoft/git-base

⚠️ Important: The base_model tag in the metadata above is initially empty. The models listed here are heuristic guesses based on the training directory name (microsoft-git-base_microsoft-git-base). Please verify these against your training configuration and update the base_model: list in the YAML metadata block at the top of this README with the correct Hugging Face model identifiers.

How to Use (Example with PEFT)::: This is generated by script and not verified manually so proceed with caution

from transformers import AutoProcessor, AutoModelForVision2Seq, Blip2ForConditionalGeneration # Or other relevant classes
from peft import PeftModel, PeftConfig
import torch

# --- Configuration ---
# 1. Specify the EXACT base model identifiers used during training
# base_processor_id = "microsoft/git-base" # <-- Replace with correct HF ID
base_model_id = "microsoft/git-base" # <-- Replace with correct HF ID (e.g., Salesforce/blip2-opt-2.7b)

# 2. Specify the PEFT adapter repository ID (this repo)
adapter_repo_id = "ashimdahal/microsoft-git-base_microsoft-git-base"

# --- Load Base Model and Processor ---
processor = AutoProcessor.from_pretrained(base_processor_id)

# Load the base model (ensure it matches the type used for training)
# Example for BLIP-2 OPT:
base_model = Blip2ForConditionalGeneration.from_pretrained(
     base_model_id,
     torch_dtype=torch.float16 # Or torch.bfloat16 or float32, match training/inference needs
 )
# Or for other model types:
base_model = AutoModelForVision2Seq.from_pretrained(base_model_id, torch_dtype=torch.float16)

# --- Load PEFT Adapter ---
# Load the adapter config and merge the adapter weights into the base model
model = PeftModel.from_pretrained(base_model, adapter_repo_id)
model = model.merge_and_unload() # Merge weights for inference (optional but often recommended)
model.eval() # Set model to evaluation mode

# --- Inference Example ---
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
#
image = ... # Load your image (e.g., using PIL)
text = "a photo of" # Optional prompt start
#
inputs = processor(images=image, text=text, return_tensors="pt").to(device, torch.float16) # Match model dtype

generated_ids = model.generate(**inputs, max_new_tokens=50)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
print(f"Generated Caption: {{generated_text}}")

More model-specific documentation, evaluation results, and usage examples should be added here.