--- license: apache-2.0 base_model: - timm/tf_efficientnetv2_s.in21k_ft_in1k - Ultralytics/YOLO11 tags: - comfyui - object-detection - face-detection - face-segmentation - pytorch - image-segmentation ---

Custom-trained models for face detection and segmentation across realistic, anime, and NSFW content. Made for the **Forbidden Vision** ComfyUI custom nodes GitHub Repository

--- ## 🎯 Why These Models Exist Traditional face models fail where it matters most for AI art workflows: | **Problem** | **Why It Matters** | |-------------|-------------------| | 🎨 **Domain-locked** | Existing models excel at *either* anime *or* realistic—never both | | 🔞 **NSFW blindness** | Most models trained only on SFW data break on adult content | | 👁️‍🗨️ **Detail blindness** | Most models miss anime eyebrows, real eyelashes etc. | | 🎲 **Generation artifacts** | Standard datasets don't include diffusion model quirks and failures | **These models solve all 4.**

The segmentation model predicts face masks, stylistic eyebrows, eyelashes etc.

--- ## 📊 Training Foundation ### The Dataset Difference Built from **14,000+ manually annotated images** across the domains that actually matter for AI generation:

**🎨 Multi-Domain Coverage** - SDXL, SD1.5, Pony, Illustrious outputs - Curated Danbooru (anime styles) - Real photography - Full NSFW inclusion (no filtering)

**💎 Edge Case Priority** - ✓ Extreme angles & occlusions - ✓ Failed/broken generations - ✓ Low-quality artifacts - ✓ Unusual expressions & poses - ✓ Everything other models ignore

### What This Means For You ``` Traditional models: Trained on clean celebrity faces ↓ Fail on real workflows These models: Trained on what you actually generate ↓ Work when you need them ``` **One model family. Every domain. Zero compromises.** ## Model Details ### Face Detection (YOLOv11-Small) **Purpose:** Primary face detection with high recall **Training Approach:** - After every training run, I ran the model on a new mixed dataset, hardmining failures and improving the dataset until an acceptable performance was reached - Trained at 640px resolution (inference should use same resolution) **Why YOLOv11-Small instead of nano?** More reliable detection across mixed realistic/anime domains with acceptable speed tradeoff. --- ### Segmentation (EfficientNet-v2) **Purpose:** Precise face mask generation **Training Approach:** - Dataset prepared using the Forbidden Vision YOLO model at 512px resolution - Iterative hardmine training in multiple phases: - Train on the initial 700 samples - Evaluate on remaining images to find failure cases - Correct failed masks and add them to the dataset - Retrain with the expanded dataset - Repeat until failure cases drop to near-zero (final dataset: 4k+ images) **Features:** - Detects and includes facial features other models ignore, like protruding anime eybrows, realistic eyelashes sticking out of the face etc. - Glasses and similar are treated as part of the face, even if sticking outside the face shape - NSFW friendly across both anime, realistic and 3d domains --- ## Usage These models are automatically downloaded and used by the **Fixer** node in ComfyUI Forbidden Vision. ## License Apache 2.0 --- ## Contact - GitHub: [ComfyUI-Forbidden-Vision](https://github.com/luxdelux7/ComfyUI-Forbidden-Vision) - Issues: [GitHub Issues](https://github.com/luxdelux7/ComfyUI-Forbidden-Vision/issues) - Support: [Ko-fi](https://ko-fi.com/luxdelux)