YuNet Face Detection (GGUF)

GGUF conversion of YuNet for use with CrispEmbed.

YuNet is a lightweight face detector based on ShuffleNetV2, originally shipped with OpenCV. This GGUF file was converted from the face_detection_yunet_2023mar.onnx checkpoint using CrispEmbed's convert-face-to-gguf.py converter.

Model Details

Property	Value
Architecture	ShuffleNetV2 backbone + FPN + multi-scale detection heads
Input	640x640 BGR, raw uint8 range [0, 255]
Strides	8, 16, 32
Outputs	cls (confidence), obj (IoU), bbox (4), kps (5 landmarks x 2) per stride
Parameters	~75K
GGUF size	222 KB
ONNX source	`face_detection_yunet_2023mar.onnx` (228 KB)
License	Apache 2.0

Usage with CrispEmbed

CLI

# Auto-download and detect
crispembed -m yunet --detect photo.jpg

# JSON output
crispembed -m yunet --detect photo.jpg --json

# Lower confidence threshold
crispembed -m yunet --detect photo.jpg --conf 0.3

Output format

Each detection contains:

x, y, w, h — bounding box (top-left corner + size) in original image coordinates
conf — detection confidence (0..1)
landmarks[10] — 5 facial landmarks as (x, y) pairs:
- [0,1] right eye
- [2,3] left eye
- [4,5] nose tip
- [6,7] right mouth corner
- [8,9] left mouth corner

Note: landmark order follows OpenCV's convention (right_eye, left_eye, nose, right_mouth, left_mouth), which differs from InsightFace/SCRFD (left_eye, right_eye, nose, left_mouth, right_mouth).

C API

#include "crispembed.h"

crispembed_ctx * ctx = crispembed_init("yunet.gguf", 4);
crispembed_face faces[32];
int n = crispembed_detect(ctx, "photo.jpg", faces, 32, 0.5f, 640);
for (int i = 0; i < n; i++) {
    printf("face %d: (%.0f,%.0f,%.0f,%.0f) conf=%.2f\n",
           i, faces[i].x, faces[i].y, faces[i].w, faces[i].h, faces[i].conf);
}
crispembed_free(ctx);

Python

from crispembed import CrispFace

det = CrispFace("yunet.gguf")
faces = det.detect("photo.jpg", conf=0.5, det_size=640)
for f in faces:
    print(f"bbox=({f['x']:.0f},{f['y']:.0f},{f['w']:.0f},{f['h']:.0f}) conf={f['confidence']:.2f}")

YuNet vs SCRFD

	YuNet	SCRFD-10G
Size	222 KB	~16 MB
Speed (CPU)	~5ms	~50ms
Accuracy (WiderFace easy)	88.3%	95.2%
Anchors per cell	1	2
Bbox decode	center+scale (exp)	distance-based
Input normalization	None (raw 0-255)	(v-127.5)/128

YuNet is best for latency-critical or resource-constrained scenarios. SCRFD is better when detection accuracy matters more than speed or model size.

Conversion

python models/convert-face-to-gguf.py \
    --onnx face_detection_yunet_2023mar.onnx \
    --output yunet.gguf \
    --model-type detection \
    --model-name yunet