|
|
--- |
|
|
tags: |
|
|
- element_type:detect |
|
|
- model:yolov12l |
|
|
- object:person |
|
|
|
|
|
manako: |
|
|
description: A YOLO12-large (YOLO12l) instance segmentation model trained specifically for detecting and segmenting people with high precision. This model is a fine-tuned YOLO12-seg model optimized exclusively for person segmentation. It uses the large (L) scale configuration of YOLO12, featuring 28.76M parameters and 510 layers with a depth and width of 1.0. |
|
|
source: https://huggingface.co/RyanJames/yolo12l-person-seg/tree/main |
|
|
input_payload: |
|
|
- name: frame |
|
|
type: image |
|
|
description: RGB frame |
|
|
output_payload: |
|
|
- name: detections |
|
|
type: detections |
|
|
description: List of detections |
|
|
evaluation_score: 0.9064628125362165 |
|
|
--- |
|
|
|
|
|
# YOLO12-seg Person Segmentation Model |
|
|
|
|
|
A YOLO12-large (YOLO12l) instance segmentation model trained specifically for detecting and |
|
|
segmenting people with high precision. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model is a fine-tuned YOLO12-seg model optimized exclusively for person segmentation. It uses |
|
|
the large (L) scale configuration of YOLO12, featuring 28.76M parameters and 510 layers with a depth |
|
|
and width of 1.0. |
|
|
|
|
|
### Key Features |
|
|
|
|
|
- **Single-Class Focus**: Specialized in detecting only people |
|
|
- **Detailed Segmentation**: Provides pixel-perfect segmentation masks |
|
|
- **High Throughput**: Optimized for processing hundreds of images per minute |
|
|
- **Quality-Optimized**: Trained specifically for accurate boundary delineation |
|
|
- **GPU-Optimized**: The Large (L) model is designed for GPU deployment, not edge devices or |
|
|
mobile phones |
|
|
|
|
|
### Available Models |
|
|
|
|
|
This repository contains two model versions: |
|
|
|
|
|
- `yolo12l-person-seg.pt`: The original model trained for 100 epochs. |
|
|
- `yolo12l-person-seg-extended.pt`: The improved model after extended training for 300 epochs |
|
|
(recommended). |
|
|
|
|
|
## Training |
|
|
|
|
|
The model was trained on a filtered version of the COCO dataset containing only images with people: |
|
|
|
|
|
- **Training Images**: 64,114 images containing people |
|
|
- **Validation Images**: 2,693 images containing people |
|
|
- **Training Details**: |
|
|
- Initially trained for 100 epochs, then extended training continued for a total of 300 |
|
|
epochs. |
|
|
- Input resolution: 640×640 |
|
|
- Class-focused optimization with `single_cls=True` and `classes=0` |
|
|
- Optimized for segmentation with `overlap_mask=True` and `mask_ratio=4` |
|
|
|
|
|
## Performance |
|
|
|
|
|
The model achieves the following metrics on the COCO person validation set: |
|
|
|
|
|
| Metric | Value | |
|
|
| ------------------- | ----- | |
|
|
| Box mAP50-95 (COCO) | 0.642 | |
|
|
| Box mAP50 (COCO) | 0.851 | |
|
|
| Mask mAP50-95 | 0.537 | |
|
|
| Mask mAP50 | 0.837 | |
|
|
| Box Precision | 0.840 | |
|
|
| Box Recall | 0.759 | |
|
|
| Mask Precision | 0.843 | |
|
|
| Mask Recall | 0.748 | |
|
|
|
|
|
Note: These metrics reflect the performance of the extended 300-epoch model |
|
|
(`yolo12l-person-seg-extended.pt`). |
|
|
|
|
|
These metrics were computed on the standard COCO `val2017` validation set. |
|
|
|
|
|
## Example Results |
|
|
|
|
|
<table> |
|
|
<tr> |
|
|
<td><img src="examples/example2.png" alt="Person segmentation example 2" /></td> |
|
|
<td><img src="examples/example4.png" alt="Person segmentation example 4" /></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td><img src="examples/example1.png" alt="Person segmentation example 1" /></td> |
|
|
<td><img src="examples/example3.png" alt="Person segmentation example 3" /></td> |
|
|
</tr> |
|
|
</table> |
|
|
|
|
|
<div align="center"> |
|
|
<img src="examples/example5.png" alt="Person segmentation example 5" style="max-width:90%;" /> |
|
|
</div> |
|
|
|
|
|
The model effectively segments people in various poses, lighting conditions, and contexts, providing |
|
|
accurate masks even with complex backgrounds. As shown in these examples, the segmentation masks |
|
|
(highlighted in color) precisely outline the human subjects, making this model ideal for |
|
|
applications requiring detailed person isolation. |
|
|
|
|
|
## Use Cases |
|
|
|
|
|
This model is ideal for applications requiring precise person segmentation: |
|
|
|
|
|
- Human-centric image editing |
|
|
- Background removal focused on people |
|
|
- Virtual try-on applications |
|
|
- People counting and crowd analysis |
|
|
- Smart surveillance systems |
|
|
|
|
|
## Usage |
|
|
|
|
|
The model can be used directly with Ultralytics YOLO: |
|
|
|
|
|
```python |
|
|
from ultralytics import YOLO |
|
|
|
|
|
# Load the model |
|
|
model = YOLO('path/to/yolo12l-person-seg-extended.pt') # Or yolo12l-person-seg.pt for the original |
|
|
|
|
|
# Perform inference |
|
|
results = model('image.jpg') |
|
|
|
|
|
# Process results (segmentation masks and bounding boxes) |
|
|
for result in results: |
|
|
boxes = result.boxes # Tensor operations can be performed on boxes |
|
|
masks = result.masks # Segmentation masks |
|
|
|
|
|
if masks is not None: |
|
|
# Process masks |
|
|
for mask in masks: |
|
|
# Use the mask for your application |
|
|
pass |
|
|
``` |
|
|
|
|
|
For segmentation visualization: |
|
|
|
|
|
```python |
|
|
import cv2 |
|
|
import numpy as np |
|
|
from ultralytics import YOLO |
|
|
|
|
|
# Load the model and image |
|
|
model = YOLO('path/to/yolo12l-person-seg-extended.pt') # Or yolo12l-person-seg.pt for the original |
|
|
image = cv2.imread('image.jpg') |
|
|
|
|
|
# Perform inference |
|
|
results = model(image) |
|
|
|
|
|
# Process and visualize the first result |
|
|
result = results[0] |
|
|
if result.masks is not None: |
|
|
masks = result.masks.data.cpu().numpy() |
|
|
for i, mask in enumerate(masks): |
|
|
# Create a colored overlay for each mask |
|
|
color = [np.random.randint(0, 255) for _ in range(3)] |
|
|
mask_image = np.zeros_like(image, dtype=np.uint8) |
|
|
mask_image[mask.astype(bool)] = color |
|
|
image = cv2.addWeighted(image, 1.0, mask_image, 0.5, 0) |
|
|
|
|
|
# Display or save the image |
|
|
cv2.imwrite('segmented_image.jpg', image) |
|
|
``` |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- This model is optimized for person segmentation only and won't detect other classes |
|
|
- Performance may be reduced in extreme lighting conditions |
|
|
- Occluded persons may have incomplete segmentation masks |
|
|
- Small or distant people might not be detected as reliably as those in foreground |
|
|
- **GPU Recommended**: As a Large (L) model, real-time inference performance benefits from a |
|
|
dedicated GPU |
|
|
- **Edge Device Limitations**: Not optimized for mobile or edge deployment (consider YOLO12n or |
|
|
YOLO12s for those use cases) |
|
|
|
|
|
## License |
|
|
|
|
|
This model is available under the GNU Affero General Public License v3.0 (AGPL-3.0). |
|
|
|
|
|
### License Note |
|
|
|
|
|
This model was trained using the Ultralytics YOLO framework, which is licensed under the GNU Affero |
|
|
General Public License v3.0 (AGPL-3.0). As per the terms of the AGPL-3.0 license, any derivative |
|
|
works (including trained models) must also be distributed under the same license. |
|
|
|