This model is Vit-14 with a fine-tuned 512 Text Length

#1
by Felldude - opened

This model is Vit-14 with a fine-tuned 512 Text Length.
I am not aware of any models that have been trained to this size with CLIP-L.
If using this model as a Teacher/Student - further finetuning at 400+ words should likely be done

Sign up or log in to comment