Image-to-Text
Transformers
PyTorch
Chinese
vision-encoder-decoder
image-text-to-text
image-captioning
Instructions to use Maciel/Muge-Image-Caption with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Maciel/Muge-Image-Caption with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="Maciel/Muge-Image-Caption")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("Maciel/Muge-Image-Caption") model = AutoModelForMultimodalLM.from_pretrained("Maciel/Muge-Image-Caption") - Notebooks
- Google Colab
- Kaggle
可以训练手写汉字识别吗?
#2
by li1993 - opened
可以训练手写汉字识别吗?我想通过手写的汉字识图片别汉字,这个模型支持这么训练吗?
理论上有图文映射信息的数据集,是可以的。但是我不建议从我这个model继续finetune,我的文本都是描述图片内容的。你可以按照这一套模式,利用原始的vit和bert去对齐finetune。
可以找你学这方面的知识吗?
可以先看看这个git项目,我也是利用图像预训练模型和文本预训练模型对齐的,所以你提供手写汉字的数据集就可以了。
能留个联系方式吗?我的微信号18820977710