|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- microsoft/git-base-coco |
|
|
--- |
|
|
|
|
|
Given a photo of a face, will describe it. |
|
|
Be careful as it can be unflattering. |
|
|
|
|
|
Based on the GIT-Base-COCO image to text model and fine-tuned on [Face2Text](https://zenodo.org/records/10973388). |
|
|
|
|
|
How to use: |
|
|
``` |
|
|
from transformers import AutoProcessor, AutoModelForCausalLM, AutoTokenizer |
|
|
import cv2 |
|
|
|
|
|
DEVICE = 'cpu' # cpu or cuda |
|
|
IMG_PATH = 'face.png' |
|
|
|
|
|
processor = AutoProcessor.from_pretrained('microsoft/git-base-coco') |
|
|
model = AutoModelForCausalLM.from_pretrained('mtanti/face-describer') |
|
|
tokeniser = AutoTokenizer.from_pretrained('microsoft/git-base-coco') |
|
|
model.eval() |
|
|
model.to(DEVICE) |
|
|
|
|
|
img = cv2.imread(IMG_PATH) |
|
|
tensor_img = processor( |
|
|
images=[img[:, :, ::-1]], |
|
|
return_tensors='pt', |
|
|
)['pixel_values'].to(DEVICE) |
|
|
desc = tokeniser.decode( |
|
|
model.generate(pixel_values=tensor_img, max_length=100, repetition_penalty=1.05, do_sample=True)[0, :], |
|
|
skip_special_tokens=True, |
|
|
) |
|
|
``` |