frankmorales2020

Create README.md

9605200 verified about 1 year ago

4.22 kB

	````markdown
	# lora_fine_tuned_phi-4_quantized_vision

	This repository contains a fine-tuned version of the Phi-4 language model specifically adapted for image-to-text generation.

	The model has been fine-tuned using LoRA (Low-Rank Adaptation) on the FGVC Aircraft dataset, which consists of images of aircraft with corresponding textual descriptions. This fine-tuning process enables the model to generate more accurate and descriptive captions for aircraft images.

	Key Features:

	* 4-bit Quantization: The model utilizes 4-bit quantization techniques to reduce its size and memory footprint, making it more efficient to deploy and use.
	* LoRA: Fine-tuning is performed with LoRA, which allows for efficient adaptation of the model while keeping the number of trainable parameters low.
	* Image Captioning: The model is specifically trained to generate textual descriptions (captions) for images of aircraft.

	Intended Use Cases:

	* Image Captioning: Generate descriptive captions for aircraft images.
	* Aircraft Recognition: Assist in identifying different types of aircraft based on their visual features.
	* Educational Purposes: Used as a tool for learning about different aircraft models.

	How to Use:

	You can use this model directly from Hugging Face Transformers:

	```python
	from transformers import pipeline, AutoTokenizer, BitsAndBytesConfig, AutoModelForCausalLM
	from peft import PeftModel

	# Load the tokenizer
	tokenizer = AutoTokenizer.from_pretrained("frankmorales2020/lora_fine_tuned_phi-4_quantized_vision")

	# Load the base model with 4-bit quantization
	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_use_double_quant=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.bfloat16
	)

	base_model = AutoModelForCausalLM.from_pretrained(
	"microsoft/phi-4",
	quantization_config=bnb_config,
	low_cpu_mem_usage=True
	)

	# Load the locally fine-tuned model with LoRA adapter
	model = PeftModel.from_pretrained(
	base_model, # Pass the base model instance
	"frankmorales2020/lora_fine_tuned_phi-4_quantized_vision", # Load from HF Hub
	device_map={"": 0},
	)

	# Set the pad_token_id for the model explicitly
	model.generation_config.pad_token_id = tokenizer.pad_token_id if tokenizer.pad_token_id is not None else tokenizer.eos_token_id
	tokenizer.pad_token = tokenizer.eos_token
	model.pad_token_id = model.config.eos_token_id

	# Create a text generation pipeline
	generator = pipeline(task="text-generation", model=model, tokenizer=tokenizer)

	# Generate captions for an image (replace with your image processing logic)
	image_path = "path/to/your/aircraft/image.jpg"
	# ... (Add your image loading and preprocessing code here) ...

	prompt = f"Generate a caption for the following image: {processed_image}"
	generated_caption = generator(prompt, max_length=64)[0]['generated_text']
	print(generated_caption)
	````

	Training Data:

	The model was trained on the FGVC Aircraft dataset ([https://www.robots.ox.ac.uk/\~vgg/data/fgvc-aircraft/](https://www.google.com/url?sa=E&source=gmail&q=https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/)).

	Evaluation:

	The model was evaluated using the BLEU metric on a held-out test set from the FGVC Aircraft dataset.

	Limitations:

	* The model is specifically fine-tuned for aircraft images and may not generalize well to other types of images.
	* The generated captions may sometimes be overly generic or lack fine-grained details.

	Future Work:

	* Fine-tune the model on a larger and more diverse dataset of images.
	* Explore more advanced image encoding techniques to improve the model's understanding of visual features.
	* Experiment with different decoding strategies to generate more detailed and human-like captions.

	Acknowledgements:

	This work is based on the Phi-4 language model developed by Microsoft and utilizes the Hugging Face Transformers and Datasets libraries.

	```

	Remember to:

	* Replace `"path/to/your/aircraft/image.jpg"` with the actual path to your image.
	* Add your image loading and preprocessing code in the designated section.
	* Consider adding a license (e.g., MIT License) to your repository.
	```