|
|
--- |
|
|
base_model: llava-hf/llava-1.5-7b-hf |
|
|
library_name: peft |
|
|
--- |
|
|
## Personalized Sticker Retrieval with Vision-Language Model (PerSRV) |
|
|
PerSRV provides search keywords given a sticker image and prompt. For more information, please see our paper at the end. |
|
|
|
|
|
## Usage |
|
|
|
|
|
``` |
|
|
from transformers import AutoProcessor, LlavaForConditionalGeneration |
|
|
from PIL import Image |
|
|
|
|
|
PROCESSOR_ID = "llava-hf/llava-1.5-7b-hf" |
|
|
processor = AutoProcessor.from_pretrained(PROCESSOR_ID) |
|
|
processor.tokenizer.padding_side = "left" |
|
|
|
|
|
MODEL_ID = "metchee/persrv" |
|
|
tuned_model = LlavaForConditionalGeneration.from_pretrained( |
|
|
MODEL_ID, |
|
|
torch_dtype=torch.float16, |
|
|
quantization_config=quantization_config, |
|
|
) |
|
|
|
|
|
image_path = "" |
|
|
prompt = f"USER: <image>\n你是个表情包专家,仔细观察、理解图片中的想表达的感觉,把这个感觉转换成关键词。\nASSISTANT:" |
|
|
image = Image.open(image_path).convert("RGB") |
|
|
inputs = processor(text=prompt, images=[image], return_tensors="pt").to("cuda") |
|
|
generated_ids = tuned_model.generate(**inputs, max_new_tokens=MAX_LENGTH) |
|
|
generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True) |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
If you find PerSRV helpful to your research, please cite the following paper :) |
|
|
``` |
|
|
@misc{chee2024persrvpersonalizedstickerretrieval, |
|
|
title={PerSRV: Personalized Sticker Retrieval with Vision-Language Model}, |
|
|
author={Heng Er Metilda Chee and Jiayin Wang and Zhiqiang Guo and Weizhi Ma and Min Zhang}, |
|
|
year={2024}, |
|
|
eprint={2410.21801}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.IR}, |
|
|
url={https://arxiv.org/abs/2410.21801}, |
|
|
} |
|
|
``` |
|
|
|
|
|
### Framework versions |
|
|
|
|
|
- PEFT 0.12.0 |
|
|
- Transformers 4.41.2 |
|
|
- bitsandbytes 0.43.3 |
|
|
|