metadata
license: mit
frozen SigLIP and ResNet50 backbone with a trained attention-fusion head, outputting a 512 dimensional feature vector, trained via InfoNCE contrastive loss on 1 million comic panels.
license: mit
frozen SigLIP and ResNet50 backbone with a trained attention-fusion head, outputting a 512 dimensional feature vector, trained via InfoNCE contrastive loss on 1 million comic panels.