| ````markdown | |
| # lora_fine_tuned_phi-4_quantized_vision | |
| This repository contains a fine-tuned version of the **Phi-4** language model specifically adapted for **image-to-text generation**. | |
| The model has been fine-tuned using **LoRA (Low-Rank Adaptation)** on the **FGVC Aircraft** dataset, which consists of images of aircraft with corresponding textual descriptions. This fine-tuning process enables the model to generate more accurate and descriptive captions for aircraft images. | |
| **Key Features:** | |
| * **4-bit Quantization:** The model utilizes 4-bit quantization techniques to reduce its size and memory footprint, making it more efficient to deploy and use. | |
| * **LoRA:** Fine-tuning is performed with LoRA, which allows for efficient adaptation of the model while keeping the number of trainable parameters low. | |
| * **Image Captioning:** The model is specifically trained to generate textual descriptions (captions) for images of aircraft. | |
| **Intended Use Cases:** | |
| * **Image Captioning:** Generate descriptive captions for aircraft images. | |
| * **Aircraft Recognition:** Assist in identifying different types of aircraft based on their visual features. | |
| * **Educational Purposes:** Used as a tool for learning about different aircraft models. | |
| **How to Use:** | |
| You can use this model directly from Hugging Face Transformers: | |
| ```python | |
| from transformers import pipeline, AutoTokenizer, BitsAndBytesConfig, AutoModelForCausalLM | |
| from peft import PeftModel | |
| # Load the tokenizer | |
| tokenizer = AutoTokenizer.from_pretrained("frankmorales2020/lora_fine_tuned_phi-4_quantized_vision") | |
| # Load the base model with 4-bit quantization | |
| bnb_config = BitsAndBytesConfig( | |
| load_in_4bit=True, | |
| bnb_4bit_use_double_quant=True, | |
| bnb_4bit_quant_type="nf4", | |
| bnb_4bit_compute_dtype=torch.bfloat16 | |
| ) | |
| base_model = AutoModelForCausalLM.from_pretrained( | |
| "microsoft/phi-4", | |
| quantization_config=bnb_config, | |
| low_cpu_mem_usage=True | |
| ) | |
| # Load the locally fine-tuned model with LoRA adapter | |
| model = PeftModel.from_pretrained( | |
| base_model, # Pass the base model instance | |
| "frankmorales2020/lora_fine_tuned_phi-4_quantized_vision", # Load from HF Hub | |
| device_map={"": 0}, | |
| ) | |
| # Set the pad_token_id for the model explicitly | |
| model.generation_config.pad_token_id = tokenizer.pad_token_id if tokenizer.pad_token_id is not None else tokenizer.eos_token_id | |
| tokenizer.pad_token = tokenizer.eos_token | |
| model.pad_token_id = model.config.eos_token_id | |
| # Create a text generation pipeline | |
| generator = pipeline(task="text-generation", model=model, tokenizer=tokenizer) | |
| # Generate captions for an image (replace with your image processing logic) | |
| image_path = "path/to/your/aircraft/image.jpg" | |
| # ... (Add your image loading and preprocessing code here) ... | |
| prompt = f"Generate a caption for the following image: {processed_image}" | |
| generated_caption = generator(prompt, max_length=64)[0]['generated_text'] | |
| print(generated_caption) | |
| ```` | |
| **Training Data:** | |
| The model was trained on the FGVC Aircraft dataset ([https://www.robots.ox.ac.uk/\~vgg/data/fgvc-aircraft/](https://www.google.com/url?sa=E&source=gmail&q=https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/)). | |
| **Evaluation:** | |
| The model was evaluated using the BLEU metric on a held-out test set from the FGVC Aircraft dataset. | |
| **Limitations:** | |
| * The model is specifically fine-tuned for aircraft images and may not generalize well to other types of images. | |
| * The generated captions may sometimes be overly generic or lack fine-grained details. | |
| **Future Work:** | |
| * Fine-tune the model on a larger and more diverse dataset of images. | |
| * Explore more advanced image encoding techniques to improve the model's understanding of visual features. | |
| * Experiment with different decoding strategies to generate more detailed and human-like captions. | |
| **Acknowledgements:** | |
| This work is based on the Phi-4 language model developed by Microsoft and utilizes the Hugging Face Transformers and Datasets libraries. | |
| ``` | |
| **Remember to:** | |
| * Replace `"path/to/your/aircraft/image.jpg"` with the actual path to your image. | |
| * Add your image loading and preprocessing code in the designated section. | |
| * Consider adding a license (e.g., MIT License) to your repository. | |
| ``` |