Spaces:
Sleeping
Sleeping
Refactoring Summary
What Was Done
1. Model Directory Usage Analysis
The backend uses the following files from /Model/ directory:
embeddings_cache.pkl- Face recognition embeddings cacheyolov8n-face.pt- YOLO face detection modelmy_scan.mp4- Reference 360-degree scan videoAdi.jpg- Reference images
Both single_tracker.py and multi_tracker.py access the Model directory.
2. Created New Services
services/face_recognition.py
- Extracted face recognition logic from
Model/face_model.py - Class:
FaceRecognitionService - Methods:
extract_embeddings_from_video()- Process 360Β° video with quality filteringextract_embeddings_from_image()- Process single reference imagesave_embeddings_cache()- Save processed embeddingsload_embeddings_cache()- Load cached embeddingscalculate_blur_score()- Image sharpness detectioncalculate_frontal_score()- Face frontality score
services/audio_processing.py
- New service for audio streaming with angle data
- Class:
AudioProcessor - Methods:
create_audio_stream()- Start new recording sessionwrite_audio_chunk()- Write audio with optional angle metadataclose_audio_stream()- Finalize recordingget_audio_files()- List all recordings
3. Added API Endpoints to server.py
Face Recognition APIs:
POST /api/face/upload-video- Upload 360Β° reference videoPOST /api/face/upload-image- Upload reference imageGET /api/face/cache-status- Check embeddings cache status
Audio Streaming APIs:
POST /api/audio/start-stream- Start audio recording sessionWebSocket /ws/audio/{session_id}- Stream audio with angle dataPOST /api/audio/stop-stream/{session_id}- Stop recordingGET /api/audio/recordings- List all recordings
4. File Storage Structure
/Model/
βββ my_scan.mp4 # Reference video (uploaded via API)
βββ ref_*.jpg # Reference images (uploaded via API)
βββ embeddings_cache.pkl # Processed face embeddings
βββ yolov8n-face.pt # YOLO model (static)
βββ audio_recordings/
βββ audio_{uuid}_{timestamp}.wav # Audio recording
βββ audio_{uuid}_{timestamp}_metadata.txt # Angle metadata (CSV)
5. Audio Metadata Format
The metadata file stores timestamp and angle in CSV format:
timestamp,angle
0.000,45.50
0.064,46.20
0.128,47.00
How to Use
Upload 360-Degree Video:
curl -X POST "http://localhost:8000/api/face/upload-video" \
-H "Authorization: Bearer YOUR_TOKEN" \
-F "file=@my_360_scan.mp4"
Upload Reference Image:
curl -X POST "http://localhost:8000/api/face/upload-image" \
-H "Authorization: Bearer YOUR_TOKEN" \
-F "file=@reference.jpg"
Start Audio Stream:
# 1. Start stream (get session_id)
curl -X POST "http://localhost:8000/api/audio/start-stream" \
-H "Authorization: Bearer YOUR_TOKEN" \
-F "sample_rate=16000" \
-F "channels=1"
# 2. Connect via WebSocket and stream
# ws://localhost:8000/ws/audio/{session_id}
# 3. Send audio chunks (binary or JSON with angle)
# Binary: raw 16-bit PCM audio bytes
# JSON: {"audio_data": "base64...", "angle": 45.5}
# 4. Stop: {"command": "stop"}
Key Features
- Quality Filtering: Video processing uses blur detection and frontal face scoring to select best frames
- Temporal Spacing: Selects frames evenly distributed across the video for comprehensive coverage
- Angle Tracking: Audio streams can include direction/angle metadata for spatial audio analysis
- Mono/Stereo Support: Configurable audio channels (1 or 2)
- Authentication: All endpoints protected with JWT tokens
- Async Processing: CPU-intensive tasks run in thread pool executor
Original face_model.py
The original file at /Model/face_model.py remains unchanged and can still be run standalone for testing or manual processing. The new API provides the same functionality but in a service-oriented architecture accessible via HTTP/WebSocket.
Dependencies
All required packages are already in requirements.txt:
- FastAPI, Uvicorn
- OpenCV (cv2)
- DeepFace
- Ultralytics (YOLO)
- NumPy
- Wave (stdlib)
No additional dependencies needed!