Upload V2: stackformer GPT-2 + sparse cross-attention vision model (128 visual tokens) 4666d47 verified gurumurthy3 commited on about 12 hours ago