metchee
/

persrv

Model card Files Files and versions

persrv / README.md

metchee's picture

Update README.md

e3ec287 verified over 1 year ago

|

history blame contribute delete

1.68 kB

	---
	base_model: llava-hf/llava-1.5-7b-hf
	library_name: peft
	---
	## Personalized Sticker Retrieval with Vision-Language Model (PerSRV)
	PerSRV provides search keywords given a sticker image and prompt. For more information, please see our paper at the end.

	## Usage

	```
	from transformers import AutoProcessor, LlavaForConditionalGeneration
	from PIL import Image

	PROCESSOR_ID = "llava-hf/llava-1.5-7b-hf"
	processor = AutoProcessor.from_pretrained(PROCESSOR_ID)
	processor.tokenizer.padding_side = "left"

	MODEL_ID = "metchee/persrv"
	tuned_model = LlavaForConditionalGeneration.from_pretrained(
	MODEL_ID,
	torch_dtype=torch.float16,
	quantization_config=quantization_config,
	)

	image_path = ""
	prompt = f"USER: <image>\n你是个表情包专家，仔细观察、理解图片中的想表达的感觉，把这个感觉转换成关键词。\nASSISTANT:"
	image = Image.open(image_path).convert("RGB")
	inputs = processor(text=prompt, images=[image], return_tensors="pt").to("cuda")
	generated_ids = tuned_model.generate(**inputs, max_new_tokens=MAX_LENGTH)
	generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)
	```

	## Citation
	If you find PerSRV helpful to your research, please cite the following paper :)
	```
	@misc{chee2024persrvpersonalizedstickerretrieval,
	title={PerSRV: Personalized Sticker Retrieval with Vision-Language Model},
	author={Heng Er Metilda Chee and Jiayin Wang and Zhiqiang Guo and Weizhi Ma and Min Zhang},
	year={2024},
	eprint={2410.21801},
	archivePrefix={arXiv},
	primaryClass={cs.IR},
	url={https://arxiv.org/abs/2410.21801},
	}
	```

	### Framework versions

	- PEFT 0.12.0
	- Transformers 4.41.2
	- bitsandbytes 0.43.3