--- base_model: llava-hf/llava-1.5-7b-hf library_name: peft --- ## Personalized Sticker Retrieval with Vision-Language Model (PerSRV) PerSRV provides search keywords given a sticker image and prompt. For more information, please see our paper at the end. ## Usage ``` from transformers import AutoProcessor, LlavaForConditionalGeneration from PIL import Image PROCESSOR_ID = "llava-hf/llava-1.5-7b-hf" processor = AutoProcessor.from_pretrained(PROCESSOR_ID) processor.tokenizer.padding_side = "left" MODEL_ID = "metchee/persrv" tuned_model = LlavaForConditionalGeneration.from_pretrained( MODEL_ID, torch_dtype=torch.float16, quantization_config=quantization_config, ) image_path = "" prompt = f"USER: \n你是个表情包专家,仔细观察、理解图片中的想表达的感觉,把这个感觉转换成关键词。\nASSISTANT:" image = Image.open(image_path).convert("RGB") inputs = processor(text=prompt, images=[image], return_tensors="pt").to("cuda") generated_ids = tuned_model.generate(**inputs, max_new_tokens=MAX_LENGTH) generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True) ``` ## Citation If you find PerSRV helpful to your research, please cite the following paper :) ``` @misc{chee2024persrvpersonalizedstickerretrieval, title={PerSRV: Personalized Sticker Retrieval with Vision-Language Model}, author={Heng Er Metilda Chee and Jiayin Wang and Zhiqiang Guo and Weizhi Ma and Min Zhang}, year={2024}, eprint={2410.21801}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2410.21801}, } ``` ### Framework versions - PEFT 0.12.0 - Transformers 4.41.2 - bitsandbytes 0.43.3