---
license: apache-2.0
base_model:
- timm/tf_efficientnetv2_s.in21k_ft_in1k
- Ultralytics/YOLO11
tags:
- comfyui
- object-detection
- face-detection
- face-segmentation
- pytorch
- image-segmentation
---

Custom-trained models for face detection and segmentation across realistic, anime, and NSFW content.
Made for the **Forbidden Vision** ComfyUI custom nodes
GitHub Repository
---
## π― Why These Models Exist
Traditional face models fail where it matters most for AI art workflows:
| **Problem** | **Why It Matters** |
|-------------|-------------------|
| π¨ **Domain-locked** | Existing models excel at *either* anime *or* realisticβnever both |
| π **NSFW blindness** | Most models trained only on SFW data break on adult content |
| ποΈβπ¨οΈ **Detail blindness** | Most models miss anime eyebrows, real eyelashes etc. |
| π² **Generation artifacts** | Standard datasets don't include diffusion model quirks and failures |
**These models solve all 4.**
The segmentation model predicts face masks, stylistic eyebrows, eyelashes etc.
---
## π Training Foundation
### The Dataset Difference
Built from **14,000+ manually annotated images** across the domains that actually matter for AI generation:
|
**π¨ Multi-Domain Coverage**
- SDXL, SD1.5, Pony, Illustrious outputs
- Curated Danbooru (anime styles)
- Real photography
- Full NSFW inclusion (no filtering)
|
**π Edge Case Priority**
- β Extreme angles & occlusions
- β Failed/broken generations
- β Low-quality artifacts
- β Unusual expressions & poses
- β Everything other models ignore
|
### What This Means For You
```
Traditional models: Trained on clean celebrity faces
β
Fail on real workflows
These models: Trained on what you actually generate
β
Work when you need them
```
**One model family. Every domain. Zero compromises.**
## Model Details
### Face Detection (YOLOv11-Small)
**Purpose:** Primary face detection with high recall
**Training Approach:**
- After every training run, I ran the model on a new mixed dataset, hardmining failures and improving the dataset until an acceptable performance was reached
- Trained at 640px resolution (inference should use same resolution)
**Why YOLOv11-Small instead of nano?**
More reliable detection across mixed realistic/anime domains with acceptable speed tradeoff.
---
### Segmentation (EfficientNet-v2)
**Purpose:** Precise face mask generation
**Training Approach:**
- Dataset prepared using the Forbidden Vision YOLO model at 512px resolution
- Iterative hardmine training in multiple phases:
- Train on the initial 700 samples
- Evaluate on remaining images to find failure cases
- Correct failed masks and add them to the dataset
- Retrain with the expanded dataset
- Repeat until failure cases drop to near-zero
(final dataset: 4k+ images)
**Features:**
- Detects and includes facial features other models ignore, like protruding anime eybrows, realistic eyelashes sticking out of the face etc.
- Glasses and similar are treated as part of the face, even if sticking outside the face shape
- NSFW friendly across both anime, realistic and 3d domains
---
## Usage
These models are automatically downloaded and used by the **Fixer** node in ComfyUI Forbidden Vision.
## License
Apache 2.0
---
## Contact
- GitHub: [ComfyUI-Forbidden-Vision](https://github.com/luxdelux7/ComfyUI-Forbidden-Vision)
- Issues: [GitHub Issues](https://github.com/luxdelux7/ComfyUI-Forbidden-Vision/issues)
- Support: [Ko-fi](https://ko-fi.com/luxdelux)