--- license: apache-2.0 language: - en - multilingual library_name: transformers pipeline_tag: image-text-to-text --- # TayaVision — Tiny Aya Vision (Instruct) ```python import torch from PIL import Image from transformers import AutoModelForCausalLM, AutoProcessor repo = "TrishanuDas/tayavision-alignment" model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype=torch.bfloat16, trust_remote_code=True) model = model.to("cuda").eval() processor = AutoProcessor.from_pretrained(repo, trust_remote_code=True) image = Image.open("your_image.jpg").convert("RGB") messages = [ {"role": "user", "content": [ {"type": "image"}, {"type": "text", "text": "Describe this image in detail."}, ]}, ] inputs = processor.apply_chat_template( messages, images=image, add_generation_prompt=True, return_tensors="pt", ) inputs = {k: v.to("cuda") for k, v in inputs.items()} with torch.no_grad(): output_ids = model.generate(**inputs, max_new_tokens=256) response = processor.tokenizer.decode( output_ids[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True, ) print(response) ```