bitpoint
/

ImageAnalysis

vision-encoder-decoder

Model card Files Files and versions

ImageAnalysis / README.md

bitpoint's picture

Upload README.md with huggingface_hub

c076c03 verified over 1 year ago

|

history blame contribute delete

1.36 kB


	# Model Card for vit-gpt2-image-captioning

	## Model Details
	This model is a VisionEncoderDecoderModel using a ViT encoder and GPT-2 decoder to generate captions for images. It was fine-tuned by adding context information to assist in generating meaningful captions.

	- Base Model: nlpconnect/vit-gpt2-image-captioning
	- Processor: ViTImageProcessor
	- Tokenizer: GPT-2 Tokenizer
	- Generated Caption Example: "{generated_text}"

	## Intended Use
	This model is intended for generating captions for stock-related images, with an initial context provided for more accurate descriptions.

	## Limitations
	- The model might generate incorrect or biased descriptions depending on the input image or context.
	- It requires specific context inputs for the best performance.

	## How to Use
	```python
	from transformers import VisionEncoderDecoderModel, ViTImageProcessor, AutoTokenizer
	model = VisionEncoderDecoderModel.from_pretrained("your_username/your_model_name")
	processor = ViTImageProcessor.from_pretrained("your_username/your_model_name")
	tokenizer = AutoTokenizer.from_pretrained("your_username/your_model_name")
	```

	## License
	This model is licensed under the same terms as the original nlpconnect/vit-gpt2-image-captioning.