| ```markdown | |
| # Image-to-Poem Generator | |
| This project uses a pre-trained model to generate poems based on input images. It leverages the Hugging Face Transformers library and a custom-trained model to create poetic descriptions of visual content. | |
| ## Table of Contents | |
| 1. [Installation](#installation) | |
| 2. [Usage](#usage) | |
| 3. [Model Information](#model-information) | |
| 4. [Function Description](#function-description) | |
| 5. [Example](#example) | |
| 6. [Requirements](#requirements) | |
| 7. [License](#license) | |
| ## Installation | |
| To use this image-to-poem generator, you need to install the required libraries. You can do this using pip: | |
| ``` | |
| ## Usage | |
| 1. First, import the necessary modules and load the pre-trained model: | |
| ```python | |
| from transformers import AutoProcessor, AutoModelForCausalLM | |
| from PIL import Image | |
| processor = AutoProcessor.from_pretrained("Sourabh2/git-base-poem") | |
| model = AutoModelForCausalLM.from_pretrained("Sourabh2/git-base-poem") | |
| ``` | |
| 2. Define the `generate_caption` function: | |
| ```python | |
| def generate_caption(image_path): | |
| image = Image.open(image_path) | |
| inputs = processor(images=image, return_tensors="pt") | |
| pixel_values = inputs.pixel_values | |
| generated_ids = model.generate(pixel_values=pixel_values, max_length=50) | |
| generated_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] | |
| return generated_caption | |
| ``` | |
| 3. Use the function to generate a poem from an image: | |
| ```python | |
| image_path = "/path/to/your/image.jpg" | |
| output = generate_caption(image_path) | |
| print(output) | |
| ``` | |
| ## Model Information | |
| This project uses the "Sourabh2/git-base-poem" model, which is a fine-tuned version of the GIT (Generative Image-to-text Transformer) model. It has been specifically trained to generate poetic descriptions of images. | |
| ## Function Description | |
| The `generate_caption` function takes an image file path as input and returns a generated poem. Here's what it does: | |
| 1. Opens the image file using PIL (Python Imaging Library). | |
| 2. Processes the image using the pre-trained processor. | |
| 3. Generates a poetic caption using the pre-trained model. | |
| 4. Decodes the generated output and returns it as a string. | |
| ## Example | |
| ```python | |
| image_path = "/content/12330616_72ed8075fa.jpg" | |
| output = generate_caption(image_path) | |
| print(output) | |
| ``` | |
| This will print the generated poem based on the content of the image at the specified path. | |
| ## Requirements | |
| - Python 3.6+ | |
| - transformers library | |
| - Pillow (PIL) library |