English

AI Face Detector β€” Complete Documentation


Overview

This project builds a real-time AI face detection system using deep learning. The detector works on images, videos, webcam streams, and can be deployed as an API or web app.

Core features:

  • Detect faces in images, video, and live camera
  • Draw bounding boxes with confidence score
  • Detect multiple faces simultaneously
  • GPU acceleration support
  • REST API ready
  • Easy deployment (local / cloud)

Tech Stack

  • Python 3.10+
  • OpenCV
  • PyTorch
  • TorchVision
  • NumPy
  • FastAPI (optional for API)
  • Streamlit (optional UI)

Model used:

  • RetinaFace or YOLOv8-Face (recommended modern choice)

Project Structure

ai-face-detector/
β”‚
β”œβ”€β”€ models/
β”‚   └── yolov8n-face.pt
β”‚
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ detector.py
β”‚   β”œβ”€β”€ webcam.py
β”‚   β”œβ”€β”€ image_infer.py
β”‚   β”œβ”€β”€ video_infer.py
β”‚   └── api.py
β”‚
β”œβ”€β”€ requirements.txt
└── README.md

Installation

Create environment:

python -m venv venv
venv\Scripts\activate  (Windows)
source venv/bin/activate  (Linux/Mac)

Install dependencies:

pip install ultralytics opencv-python numpy fastapi uvicorn pillow

Download pretrained face model:

yolo task=detect model=yolov8n.pt

Then download face-trained weights:

https://github.com/akanametov/yolo-face/releases

Place model inside /models.


Core Face Detection Engine

Create detector.py

from ultralytics import YOLO
import cv2

class FaceDetector:
    def __init__(self, model_path="models/yolov8n-face.pt"):
        self.model = YOLO(model_path)

    def detect(self, frame):
        results = self.model(frame, conf=0.4)[0]
        faces = []

        for box in results.boxes:
            x1, y1, x2, y2 = map(int, box.xyxy[0])
            conf = float(box.conf[0])
            faces.append((x1, y1, x2, y2, conf))

        return faces

    def draw_faces(self, frame, faces):
        for (x1, y1, x2, y2, conf) in faces:
            cv2.rectangle(frame,(x1,y1),(x2,y2),(0,255,0),2)
            cv2.putText(frame,f"{conf:.2f}",(x1,y1-5),
                        cv2.FONT_HERSHEY_SIMPLEX,0.6,(0,255,0),2)
        return frame

Image Detection Script

image_infer.py

import cv2
from detector import FaceDetector

detector = FaceDetector()

img = cv2.imread("test.jpg")
faces = detector.detect(img)
output = detector.draw_faces(img, faces)

cv2.imshow("Faces", output)
cv2.waitKey(0)

Run:

python src/image_infer.py

Webcam Real-Time Detection

webcam.py

import cv2
from detector import FaceDetector

detector = FaceDetector()
cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    faces = detector.detect(frame)
    frame = detector.draw_faces(frame, faces)

    cv2.imshow("AI Face Detector", frame)
    if cv2.waitKey(1) & 0xFF == 27:
        break

cap.release()
cv2.destroyAllWindows()

Run:

python src/webcam.py

Video File Detection

video_infer.py

import cv2
from detector import FaceDetector

detector = FaceDetector()
cap = cv2.VideoCapture("video.mp4")

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    faces = detector.detect(frame)
    frame = detector.draw_faces(frame, faces)

    cv2.imshow("Video Face Detection", frame)
    if cv2.waitKey(1) & 0xFF == 27:
        break

cap.release()
cv2.destroyAllWindows()

Build REST API

api.py

from fastapi import FastAPI, UploadFile
import cv2
import numpy as np
from detector import FaceDetector

app = FastAPI()
detector = FaceDetector()

@app.post("/detect")
async def detect_faces(file: UploadFile):
    image_bytes = await file.read()
    np_arr = np.frombuffer(image_bytes, np.uint8)
    img = cv2.imdecode(np_arr, cv2.IMREAD_COLOR)

    faces = detector.detect(img)
    return {"faces": faces}

Run server:

uvicorn src.api:app --reload

Test endpoint:

POST http://127.0.0.1:8000/detect

Performance Optimization

GPU acceleration:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

Batch processing:

  • Resize images to 640x640
  • Use half precision:
self.model = YOLO(model_path).to("cuda").half()

Expected FPS:

  • CPU: 10–20 FPS
  • GPU: 60–120 FPS

Possible Extensions

Face recognition (identity matching) Emotion detection Age & gender prediction Face tracking with DeepSORT Attendance system Security surveillance system


Troubleshooting

Camera not opening:

cv2.VideoCapture(0, cv2.CAP_DSHOW)

Low FPS:

  • Reduce resolution
  • Enable GPU
  • Use smaller model (nano version)

License

Open-source for research and educational use.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support