RichardScottOZ's picture
Update README.md
d5502dd verified
|
raw
history blame
208 Bytes
metadata
license: mit

frozen SigLIP and ResNet50 backbone with a trained attention-fusion head, outputting a 512 dimensional feature vector, trained via InfoNCE contrastive loss on 1 million comic panels.