Generate answers to Polish questions
Classify images with ternary Vision Transformers, show attention
Submit audio to get text output