File size: 1,357 Bytes
c076c03 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
# Model Card for vit-gpt2-image-captioning
## Model Details
This model is a VisionEncoderDecoderModel using a ViT encoder and GPT-2 decoder to generate captions for images. It was fine-tuned by adding context information to assist in generating meaningful captions.
- **Base Model**: nlpconnect/vit-gpt2-image-captioning
- **Processor**: ViTImageProcessor
- **Tokenizer**: GPT-2 Tokenizer
- **Generated Caption Example**: "{generated_text}"
## Intended Use
This model is intended for generating captions for stock-related images, with an initial context provided for more accurate descriptions.
## Limitations
- The model might generate incorrect or biased descriptions depending on the input image or context.
- It requires specific context inputs for the best performance.
## How to Use
```python
from transformers import VisionEncoderDecoderModel, ViTImageProcessor, AutoTokenizer
model = VisionEncoderDecoderModel.from_pretrained("your_username/your_model_name")
processor = ViTImageProcessor.from_pretrained("your_username/your_model_name")
tokenizer = AutoTokenizer.from_pretrained("your_username/your_model_name")
```
## License
This model is licensed under the same terms as the original nlpconnect/vit-gpt2-image-captioning.
|