| |
|
| | --- |
| | |
| | license: apache-2.0 |
| | tags: |
| | - generated-by-script |
| | - peft |
| | - image-captioning |
| | base_model: [] |
| | |
| | |
| | --- |
| | |
| | # Model: ashimdahal/microsoft-git-base_microsoft-git-base |
| | |
| | This repository contains model artifacts for a run named `microsoft-git-base_microsoft-git-base`, likely a PEFT adapter. |
| |
|
| | ## Training Source |
| | This model was trained as part of the project/codebase available at: |
| | https://github.com/ashimdahal/captioning_image/blob/main |
| | |
| | ## Base Model Information (Heuristic) |
| | * **Processor/Vision Encoder (Guessed):** `microsoft/git-base` |
| | * **Decoder/Language Model (Guessed):** `microsoft/git-base` |
| | |
| | **⚠️ Important:** The `base_model` tag in the metadata above is initially empty. The models listed here are *heuristic guesses* based on the training directory name (`microsoft-git-base_microsoft-git-base`). Please verify these against your training configuration and update the `base_model:` list in the YAML metadata block at the top of this README with the correct Hugging Face model identifiers. |
| |
|
| | ## How to Use (Example with PEFT) |
| |
|
| | ```python |
| | from transformers import AutoProcessor, AutoModelForVision2Seq, Blip2ForConditionalGeneration # Or other relevant classes |
| | from peft import PeftModel, PeftConfig |
| | import torch |
| | |
| | # --- Configuration --- |
| | # 1. Specify the EXACT base model identifiers used during training |
| | base_processor_id = "microsoft/git-base" # <-- Replace with correct HF ID |
| | base_model_id = "microsoft/git-base" # <-- Replace with correct HF ID (e.g., Salesforce/blip2-opt-2.7b) |
| | |
| | # 2. Specify the PEFT adapter repository ID (this repo) |
| | adapter_repo_id = "ashimdahal/microsoft-git-base_microsoft-git-base" |
| | |
| | # --- Load Base Model and Processor --- |
| | processor = AutoProcessor.from_pretrained(base_processor_id) |
| | |
| | # Load the base model (ensure it matches the type used for training) |
| | # Example for BLIP-2 OPT: |
| | base_model = Blip2ForConditionalGeneration.from_pretrained( |
| | base_model_id, |
| | torch_dtype=torch.float16 # Or torch.bfloat16 or float32, match training/inference needs |
| | ) |
| | # Or for other model types: |
| | base_model = AutoModelForVision2Seq.from_pretrained(base_model_id, torch_dtype=torch.float16) |
| | base_model = AutoModelForCausalLM |
| | ...... |
| | |
| | # --- Load PEFT Adapter --- |
| | # Load the adapter config and merge the adapter weights into the base model |
| | model = PeftModel.from_pretrained(base_model, adapter_repo_id) |
| | model = model.merge_and_unload() # Merge weights for inference (optional but often recommended) |
| | model.eval() # Set model to evaluation mode |
| | |
| | # --- Inference Example --- |
| | device = "cuda" if torch.cuda.is_available() else "cpu" |
| | model.to(device) |
| | |
| | image = ... # Load your image (e.g., using PIL) |
| | text = "a photo of" # Optional prompt start |
| | |
| | inputs = processor(images=image, text=text, return_tensors="pt").to(device, torch.float16) # Match model dtype |
| | |
| | generated_ids = model.generate(**inputs, max_new_tokens=50) |
| | generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip() |
| | print(f"Generated Caption: {{generated_text}}") |
| | ``` |
| |
|
| | *More model-specific documentation, evaluation results, and usage examples should be added here.* |
| |
|