YuNet Face Detection (GGUF)

GGUF conversion of YuNet for use with CrispEmbed.

YuNet is a lightweight face detector based on ShuffleNetV2, originally shipped with OpenCV. This GGUF file was converted from the face_detection_yunet_2023mar.onnx checkpoint using CrispEmbed's convert-face-to-gguf.py converter.

Model Details

Property Value
Architecture ShuffleNetV2 backbone + FPN + multi-scale detection heads
Input 640x640 BGR, raw uint8 range [0, 255]
Strides 8, 16, 32
Outputs cls (confidence), obj (IoU), bbox (4), kps (5 landmarks x 2) per stride
Parameters ~75K
GGUF size 222 KB
ONNX source face_detection_yunet_2023mar.onnx (228 KB)
License Apache 2.0

Usage with CrispEmbed

CLI

# Auto-download and detect
crispembed -m yunet --detect photo.jpg

# JSON output
crispembed -m yunet --detect photo.jpg --json

# Lower confidence threshold
crispembed -m yunet --detect photo.jpg --conf 0.3

Output format

Each detection contains:

  • x, y, w, h โ€” bounding box (top-left corner + size) in original image coordinates
  • conf โ€” detection confidence (0..1)
  • landmarks[10] โ€” 5 facial landmarks as (x, y) pairs:
    • [0,1] right eye
    • [2,3] left eye
    • [4,5] nose tip
    • [6,7] right mouth corner
    • [8,9] left mouth corner

Note: landmark order follows OpenCV's convention (right_eye, left_eye, nose, right_mouth, left_mouth), which differs from InsightFace/SCRFD (left_eye, right_eye, nose, left_mouth, right_mouth).

C API

#include "crispembed.h"

crispembed_ctx * ctx = crispembed_init("yunet.gguf", 4);
crispembed_face faces[32];
int n = crispembed_detect(ctx, "photo.jpg", faces, 32, 0.5f, 640);
for (int i = 0; i < n; i++) {
    printf("face %d: (%.0f,%.0f,%.0f,%.0f) conf=%.2f\n",
           i, faces[i].x, faces[i].y, faces[i].w, faces[i].h, faces[i].conf);
}
crispembed_free(ctx);

Python

from crispembed import CrispFace

det = CrispFace("yunet.gguf")
faces = det.detect("photo.jpg", conf=0.5, det_size=640)
for f in faces:
    print(f"bbox=({f['x']:.0f},{f['y']:.0f},{f['w']:.0f},{f['h']:.0f}) conf={f['confidence']:.2f}")

YuNet vs SCRFD

YuNet SCRFD-10G
Size 222 KB ~16 MB
Speed (CPU) ~5ms ~50ms
Accuracy (WiderFace easy) 88.3% 95.2%
Anchors per cell 1 2
Bbox decode center+scale (exp) distance-based
Input normalization None (raw 0-255) (v-127.5)/128

YuNet is best for latency-critical or resource-constrained scenarios. SCRFD is better when detection accuracy matters more than speed or model size.

Conversion

python models/convert-face-to-gguf.py \
    --onnx face_detection_yunet_2023mar.onnx \
    --output yunet.gguf \
    --model-type detection \
    --model-name yunet

Parity

Tested against OpenCV's cv2.FaceDetectorYN on the same ONNX model:

  • Bounding box IoU: >0.99
  • Score difference: <0.01
  • Landmark difference: <2px

Source

Downloads last month
82
GGUF
Model size
53.1k params
Architecture
cnn
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for cstr/yunet-GGUF

Quantized
(1)
this model