This model is Vit-14 with a fine-tuned 512 Text Length
#1
by
Felldude
- opened
This model is Vit-14 with a fine-tuned 512 Text Length.
I am not aware of any models that have been trained to this size with CLIP-L.
If using this model as a Teacher/Student - further finetuning at 400+ words should likely be done