Spaces:

AutoFramingSoftware
/

afs-backend

Sleeping

Extracted face recognition logic from Model/face_model.py
Class: FaceRecognitionService
Methods:
- extract_embeddings_from_video() - Process 360° video with quality filtering
- extract_embeddings_from_image() - Process single reference image
- save_embeddings_cache() - Save processed embeddings
- load_embeddings_cache() - Load cached embeddings
- calculate_blur_score() - Image sharpness detection
- calculate_frontal_score() - Face frontality score

`services/audio_processing.py`

New service for audio streaming with angle data
Class: AudioProcessor
Methods:
- create_audio_stream() - Start new recording session
- write_audio_chunk() - Write audio with optional angle metadata
- close_audio_stream() - Finalize recording
- get_audio_files() - List all recordings

3. Added API Endpoints to `server.py`

Face Recognition APIs:

POST /api/face/upload-video - Upload 360° reference video
POST /api/face/upload-image - Upload reference image
GET /api/face/cache-status - Check embeddings cache status

Audio Streaming APIs:

POST /api/audio/start-stream - Start audio recording session
WebSocket /ws/audio/{session_id} - Stream audio with angle data
POST /api/audio/stop-stream/{session_id} - Stop recording
GET /api/audio/recordings - List all recordings

4. File Storage Structure

/Model/
├── my_scan.mp4                    # Reference video (uploaded via API)
├── ref_*.jpg                      # Reference images (uploaded via API)
├── embeddings_cache.pkl           # Processed face embeddings
├── yolov8n-face.pt               # YOLO model (static)
└── audio_recordings/
    ├── audio_{uuid}_{timestamp}.wav           # Audio recording
    └── audio_{uuid}_{timestamp}_metadata.txt  # Angle metadata (CSV)

5. Audio Metadata Format

The metadata file stores timestamp and angle in CSV format:

timestamp,angle
0.000,45.50
0.064,46.20
0.128,47.00

How to Use

Upload 360-Degree Video:

curl -X POST "http://localhost:8000/api/face/upload-video" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@my_360_scan.mp4"

Upload Reference Image:

curl -X POST "http://localhost:8000/api/face/upload-image" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@reference.jpg"

Start Audio Stream:

# 1. Start stream (get session_id)
curl -X POST "http://localhost:8000/api/audio/start-stream" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "sample_rate=16000" \
  -F "channels=1"

# 2. Connect via WebSocket and stream
# ws://localhost:8000/ws/audio/{session_id}

# 3. Send audio chunks (binary or JSON with angle)
# Binary: raw 16-bit PCM audio bytes
# JSON: {"audio_data": "base64...", "angle": 45.5}

# 4. Stop: {"command": "stop"}

Key Features

Quality Filtering: Video processing uses blur detection and frontal face scoring to select best frames
Temporal Spacing: Selects frames evenly distributed across the video for comprehensive coverage
Angle Tracking: Audio streams can include direction/angle metadata for spatial audio analysis
Mono/Stereo Support: Configurable audio channels (1 or 2)
Authentication: All endpoints protected with JWT tokens
Async Processing: CPU-intensive tasks run in thread pool executor

Original face_model.py

The original file at /Model/face_model.py remains unchanged and can still be run standalone for testing or manual processing. The new API provides the same functionality but in a service-oriented architecture accessible via HTTP/WebSocket.

Dependencies

All required packages are already in requirements.txt:

FastAPI, Uvicorn
OpenCV (cv2)
DeepFace
Ultralytics (YOLO)
NumPy
Wave (stdlib)

No additional dependencies needed!

Refactoring Summary

What Was Done

1. Model Directory Usage Analysis

2. Created New Services

services/face_recognition.py

services/audio_processing.py

3. Added API Endpoints to server.py