File size: 4,360 Bytes
b53629f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# Refactoring Summary

## What Was Done

### 1. **Model Directory Usage Analysis**
The backend uses the following files from `/Model/` directory:
- `embeddings_cache.pkl` - Face recognition embeddings cache
- `yolov8n-face.pt` - YOLO face detection model
- `my_scan.mp4` - Reference 360-degree scan video
- `Adi.jpg` - Reference images

Both `single_tracker.py` and `multi_tracker.py` access the Model directory.

### 2. **Created New Services**

#### `services/face_recognition.py`
- Extracted face recognition logic from `Model/face_model.py`
- Class: `FaceRecognitionService`
- Methods:
  - `extract_embeddings_from_video()` - Process 360Β° video with quality filtering
  - `extract_embeddings_from_image()` - Process single reference image
  - `save_embeddings_cache()` - Save processed embeddings
  - `load_embeddings_cache()` - Load cached embeddings
  - `calculate_blur_score()` - Image sharpness detection
  - `calculate_frontal_score()` - Face frontality score

#### `services/audio_processing.py`
- New service for audio streaming with angle data
- Class: `AudioProcessor`
- Methods:
  - `create_audio_stream()` - Start new recording session
  - `write_audio_chunk()` - Write audio with optional angle metadata
  - `close_audio_stream()` - Finalize recording
  - `get_audio_files()` - List all recordings

### 3. **Added API Endpoints to `server.py`**

#### Face Recognition APIs:
- `POST /api/face/upload-video` - Upload 360Β° reference video
- `POST /api/face/upload-image` - Upload reference image
- `GET /api/face/cache-status` - Check embeddings cache status

#### Audio Streaming APIs:
- `POST /api/audio/start-stream` - Start audio recording session
- `WebSocket /ws/audio/{session_id}` - Stream audio with angle data
- `POST /api/audio/stop-stream/{session_id}` - Stop recording
- `GET /api/audio/recordings` - List all recordings

### 4. **File Storage Structure**
```
/Model/
β”œβ”€β”€ my_scan.mp4                    # Reference video (uploaded via API)
β”œβ”€β”€ ref_*.jpg                      # Reference images (uploaded via API)
β”œβ”€β”€ embeddings_cache.pkl           # Processed face embeddings
β”œβ”€β”€ yolov8n-face.pt               # YOLO model (static)
└── audio_recordings/
    β”œβ”€β”€ audio_{uuid}_{timestamp}.wav           # Audio recording
    └── audio_{uuid}_{timestamp}_metadata.txt  # Angle metadata (CSV)
```

### 5. **Audio Metadata Format**
The metadata file stores timestamp and angle in CSV format:
```csv
timestamp,angle
0.000,45.50
0.064,46.20
0.128,47.00
```

## How to Use

### Upload 360-Degree Video:
```bash
curl -X POST "http://localhost:8000/api/face/upload-video" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@my_360_scan.mp4"
```

### Upload Reference Image:
```bash
curl -X POST "http://localhost:8000/api/face/upload-image" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@reference.jpg"
```

### Start Audio Stream:
```bash
# 1. Start stream (get session_id)
curl -X POST "http://localhost:8000/api/audio/start-stream" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "sample_rate=16000" \
  -F "channels=1"

# 2. Connect via WebSocket and stream
# ws://localhost:8000/ws/audio/{session_id}

# 3. Send audio chunks (binary or JSON with angle)
# Binary: raw 16-bit PCM audio bytes
# JSON: {"audio_data": "base64...", "angle": 45.5}

# 4. Stop: {"command": "stop"}
```

## Key Features

1. **Quality Filtering**: Video processing uses blur detection and frontal face scoring to select best frames
2. **Temporal Spacing**: Selects frames evenly distributed across the video for comprehensive coverage
3. **Angle Tracking**: Audio streams can include direction/angle metadata for spatial audio analysis
4. **Mono/Stereo Support**: Configurable audio channels (1 or 2)
5. **Authentication**: All endpoints protected with JWT tokens
6. **Async Processing**: CPU-intensive tasks run in thread pool executor

## Original face_model.py

The original file at `/Model/face_model.py` remains unchanged and can still be run standalone for testing or manual processing. The new API provides the same functionality but in a service-oriented architecture accessible via HTTP/WebSocket.

## Dependencies

All required packages are already in `requirements.txt`:
- FastAPI, Uvicorn
- OpenCV (cv2)
- DeepFace
- Ultralytics (YOLO)
- NumPy
- Wave (stdlib)

No additional dependencies needed!