--- license: mit language: - en base_model: - microsoft/git-base-coco --- Given a photo of a face, will describe it. Be careful as it can be unflattering. Based on the GIT-Base-COCO image to text model and fine-tuned on [Face2Text](https://zenodo.org/records/10973388). How to use: ``` from transformers import AutoProcessor, AutoModelForCausalLM, AutoTokenizer import cv2 DEVICE = 'cpu' # cpu or cuda IMG_PATH = 'face.png' processor = AutoProcessor.from_pretrained('microsoft/git-base-coco') model = AutoModelForCausalLM.from_pretrained('mtanti/face-describer') tokeniser = AutoTokenizer.from_pretrained('microsoft/git-base-coco') model.eval() model.to(DEVICE) img = cv2.imread(IMG_PATH) tensor_img = processor( images=[img[:, :, ::-1]], return_tensors='pt', )['pixel_values'].to(DEVICE) desc = tokeniser.decode( model.generate(pixel_values=tensor_img, max_length=100, repetition_penalty=1.05, do_sample=True)[0, :], skip_special_tokens=True, ) ```