This is a ViT based style classifier used for Pony V7 captioning.
See the captioning colab for usage details.