File size: 1,357 Bytes

c076c03


    # Model Card for vit-gpt2-image-captioning


    ## Model Details

    This model is a VisionEncoderDecoderModel using a ViT encoder and GPT-2 decoder to generate captions for images. It was fine-tuned by adding context information to assist in generating meaningful captions.


    - **Base Model**: nlpconnect/vit-gpt2-image-captioning
    - **Processor**: ViTImageProcessor
    - **Tokenizer**: GPT-2 Tokenizer
    - **Generated Caption Example**: "{generated_text}"



    ## Intended Use

    This model is intended for generating captions for stock-related images, with an initial context provided for more accurate descriptions.



    ## Limitations

    - The model might generate incorrect or biased descriptions depending on the input image or context.

    - It requires specific context inputs for the best performance.



    ## How to Use

    ```python

    from transformers import VisionEncoderDecoderModel, ViTImageProcessor, AutoTokenizer

    model = VisionEncoderDecoderModel.from_pretrained("your_username/your_model_name")

    processor = ViTImageProcessor.from_pretrained("your_username/your_model_name")

    tokenizer = AutoTokenizer.from_pretrained("your_username/your_model_name")

    ```



    ## License

    This model is licensed under the same terms as the original nlpconnect/vit-gpt2-image-captioning.