| | --- |
| | language: en |
| | library_name: transformers |
| | tags: |
| | - vision |
| | - image-segmentation |
| | - nvidia/mit-b5 |
| | - transformers.js |
| | - onnx |
| | datasets: |
| | - celebamaskhq |
| | --- |
| | |
| | # Face Parsing |
| |
|
| |  |
| |
|
| | [Semantic segmentation](https://huggingface.co/docs/transformers/tasks/semantic_segmentation) model fine-tuned from [nvidia/mit-b5](https://huggingface.co/nvidia/mit-b5) with [CelebAMask-HQ](https://github.com/switchablenorms/CelebAMask-HQ) for face parsing. For additional options, see the Transformers [Segformer docs](https://huggingface.co/docs/transformers/model_doc/segformer). |
| |
|
| | > ONNX model for web inference contributed by [Xenova](https://huggingface.co/Xenova). |
| |
|
| | ## Usage in Python |
| |
|
| | Exhaustive list of labels can be extracted from [config.json](https://huggingface.co/jonathandinu/face-parsing/blob/65972ac96180b397f86fda0980bbe68e6ee01b8f/config.json#L30). |
| |
|
| | | id | label | note | |
| | | :-: | :--------- | :---------------- | |
| | | 0 | background | | |
| | | 1 | skin | | |
| | | 2 | nose | | |
| | | 3 | eye_g | eyeglasses | |
| | | 4 | l_eye | left eye | |
| | | 5 | r_eye | right eye | |
| | | 6 | l_brow | left eyebrow | |
| | | 7 | r_brow | right eyebrow | |
| | | 8 | l_ear | left ear | |
| | | 9 | r_ear | right ear | |
| | | 10 | mouth | area between lips | |
| | | 11 | u_lip | upper lip | |
| | | 12 | l_lip | lower lip | |
| | | 13 | hair | | |
| | | 14 | hat | | |
| | | 15 | ear_r | earring | |
| | | 16 | neck_l | necklace | |
| | | 17 | neck | | |
| | | 18 | cloth | clothing | |
| | |
| | ```python |
| | import torch |
| | from torch import nn |
| | from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation |
| | |
| | from PIL import Image |
| | import matplotlib.pyplot as plt |
| | import requests |
| | |
| | # convenience expression for automatically determining device |
| | device = ( |
| | "cuda" |
| | # Device for NVIDIA or AMD GPUs |
| | if torch.cuda.is_available() |
| | else "mps" |
| | # Device for Apple Silicon (Metal Performance Shaders) |
| | if torch.backends.mps.is_available() |
| | else "cpu" |
| | ) |
| | |
| | # load models |
| | image_processor = SegformerImageProcessor.from_pretrained("jonathandinu/face-parsing") |
| | model = SegformerForSemanticSegmentation.from_pretrained("jonathandinu/face-parsing") |
| | model.to(device) |
| | |
| | # expects a PIL.Image or torch.Tensor |
| | url = "https://images.unsplash.com/photo-1539571696357-5a69c17a67c6" |
| | image = Image.open(requests.get(url, stream=True).raw) |
| | |
| | # run inference on image |
| | inputs = image_processor(images=image, return_tensors="pt").to(device) |
| | outputs = model(**inputs) |
| | logits = outputs.logits # shape (batch_size, num_labels, ~height/4, ~width/4) |
| | |
| | # resize output to match input image dimensions |
| | upsampled_logits = nn.functional.interpolate(logits, |
| | size=image.size[::-1], # H x W |
| | mode='bilinear', |
| | align_corners=False) |
| | |
| | # get label masks |
| | labels = upsampled_logits.argmax(dim=1)[0] |
| | |
| | # move to CPU to visualize in matplotlib |
| | labels_viz = labels.cpu().numpy() |
| | plt.imshow(labels_viz) |
| | plt.show() |
| | ``` |
| | |
| | ## Usage in the browser (Transformers.js) |
| | |
| | ```js |
| | import { |
| | pipeline, |
| | env, |
| | } from "https://cdn.jsdelivr.net/npm/@xenova/transformers@2.14.0"; |
| | |
| | // important to prevent errors since the model files are likely remote on HF hub |
| | env.allowLocalModels = false; |
| | |
| | // instantiate image segmentation pipeline with pretrained face parsing model |
| | model = await pipeline("image-segmentation", "jonathandinu/face-parsing"); |
| | |
| | // async inference since it could take a few seconds |
| | const output = await model(url); |
| | |
| | // each label is a separate mask object |
| | // [ |
| | // { score: null, label: 'background', mask: transformers.js RawImage { ... }} |
| | // { score: null, label: 'hair', mask: transformers.js RawImage { ... }} |
| | // ... |
| | // ] |
| | for (const m of output) { |
| | print(`Found ${m.label}`); |
| | m.mask.save(`${m.label}.png`); |
| | } |
| | ``` |
| | |
| | ### p5.js |
| | |
| | Since [p5.js](https://p5js.org/) uses an animation loop abstraction, we need to take care loading the model and making predictions. |
| | |
| | ```js |
| | // ... |
| | |
| | // asynchronously load transformers.js and instantiate model |
| | async function preload() { |
| | // load transformers.js library with a dynamic import |
| | const { pipeline, env } = await import( |
| | "https://cdn.jsdelivr.net/npm/@xenova/transformers@2.14.0" |
| | ); |
| | |
| | // important to prevent errors since the model files are remote on HF hub |
| | env.allowLocalModels = false; |
| | |
| | // instantiate image segmentation pipeline with pretrained face parsing model |
| | model = await pipeline("image-segmentation", "jonathandinu/face-parsing"); |
| | |
| | print("face-parsing model loaded"); |
| | } |
| | |
| | // ... |
| | ``` |
| | |
| | [full p5.js example](https://editor.p5js.org/jonathan.ai/sketches/wZn15Dvgh) |
| | |
| | ### Model Description |
| | |
| | - **Developed by:** [Jonathan Dinu](https://twitter.com/jonathandinu) |
| | - **Model type:** Transformer-based semantic segmentation image model |
| | - **License:** non-commercial research and educational purposes |
| | - **Resources for more information:** Transformers docs on [Segformer](https://huggingface.co/docs/transformers/model_doc/segformer) and/or the [original research paper](https://arxiv.org/abs/2105.15203). |
| | |
| | ## Limitations and Bias |
| | |
| | ### Bias |
| | |
| | While the capabilities of computer vision models are impressive, they can also reinforce or exacerbate social biases. The [CelebAMask-HQ](https://github.com/switchablenorms/CelebAMask-HQ) dataset used for fine-tuning is large but not necessarily perfectly diverse or representative. Also, they are images of.... just celebrities. |
| | |