Spaces:
Running
API Documentation
Overview
This document describes the new API endpoints added to the AFS backend for face recognition and audio streaming.
Face Recognition APIs
1. Upload 360-Degree Reference Video
Endpoint: POST /api/face/upload-video
Description: Upload a 360-degree reference video for face recognition training. The video will be processed to extract high-quality face embeddings.
Authentication: Required (JWT token)
Request:
- Content-Type:
multipart/form-data - Body:
file(video file - .mp4, .avi, .mov, .mkv)
Response:
{
"ok": true,
"message": "Video processed successfully",
"frames_used": 10,
"embeddings_count": 1
}
2. Upload Reference Image
Endpoint: POST /api/face/upload-image
Description: Upload a single reference image for face recognition.
Authentication: Required (JWT token)
Request:
- Content-Type:
multipart/form-data - Body:
file(image file - .jpg, .jpeg, .png)
Response:
{
"ok": true,
"message": "Image processed successfully",
"embeddings_count": 1,
"saved_path": "/path/to/Model/ref_image.jpg"
}
3. Get Cache Status
Endpoint: GET /api/face/cache-status
Description: Check if face recognition embeddings are cached and ready to use.
Authentication: Required (JWT token)
Response (Cached):
{
"ok": true,
"cached": true,
"video_path": "my_scan.mp4",
"model_name": "ArcFace",
"num_frames_used": 10,
"version": 2
}
Response (Not Cached):
{
"ok": true,
"cached": false,
"message": "No cache found. Please upload a reference video or image."
}
Audio Streaming APIs
1. Start Audio Stream
Endpoint: POST /api/audio/start-stream
Description: Start a new audio recording session. Returns a session ID for streaming.
Authentication: Required (JWT token)
Request:
- Content-Type:
multipart/form-data - Body:
sample_rate(optional, default: 16000)channels(optional, default: 1 for mono, 2 for stereo)
Response:
{
"ok": true,
"session_id": "uuid-here",
"filename": "/path/to/Model/audio_recordings/audio_uuid_timestamp.wav",
"sample_rate": 16000,
"channels": 1
}
2. Audio WebSocket Stream
Endpoint: WebSocket /ws/audio/{session_id}
Description: WebSocket endpoint for streaming audio data with optional angle information.
Authentication: Not required at WebSocket level (use session_id from start-stream)
Send (Binary Audio Data):
WebSocket Binary Message: raw audio bytes (16-bit PCM)
Send (JSON with Angle):
{
"audio_data": "base64-encoded-audio-bytes",
"angle": 45.5
}
Send (Stop Command):
{
"command": "stop"
}
Receive:
{
"status": "received",
"bytes": 1024
}
or
{
"status": "received",
"angle": 45.5
}
3. Stop Audio Stream
Endpoint: POST /api/audio/stop-stream/{session_id}
Description: Stop an active audio recording stream.
Authentication: Required (JWT token)
Response:
{
"ok": true,
"message": "Audio stream stopped successfully"
}
4. List Audio Recordings
Endpoint: GET /api/audio/recordings
Description: Get a list of all audio recordings.
Authentication: Required (JWT token)
Response:
{
"ok": true,
"recordings": [
"/path/to/Model/audio_recordings/audio_uuid1_timestamp1.wav",
"/path/to/Model/audio_recordings/audio_uuid2_timestamp2.wav"
],
"count": 2
}
5. Get Angle Metadata for Session
Endpoint: GET /api/audio/angles/{session_id}
Description: Retrieve angle data collected during an audio streaming session.
Authentication: Required (JWT token)
Parameters:
session_id(path parameter): The UUID of the audio session
Response:
{
"ok": true,
"session_id": "uuid-here",
"angles": [
{"timestamp": 0.000, "angle": 45.50},
{"timestamp": 0.064, "angle": 46.20},
{"timestamp": 0.128, "angle": 47.00}
],
"count": 3
}
6. Download Audio File
Endpoint: GET /api/audio/download/{session_id}
Description: Download the recorded audio file (.wav) for a specific session.
Authentication: Required (JWT token)
Parameters:
session_id(path parameter): The UUID of the audio session
Response:
- Binary WAV file with
Content-Type: audio/wav - File download with appropriate filename header
7. Set Desired Angle
Endpoint: POST /api/audio/set-angle/{session_id}
Description: Send a desired/target angle to the audio processing backend for a session.
Authentication: Required (JWT token)
Parameters:
session_id(path parameter): The UUID of the audio sessionangle(form parameter, required): Desired angle in degrees (0-360)
Request:
POST /api/audio/set-angle/session-uuid-here
Content-Type: multipart/form-data
angle=45.5
Response:
{
"ok": true,
"message": "Desired angle set to 45.5°",
"session_id": "uuid-here",
"angle": 45.5
}
Error Response (Invalid Angle):
{
"detail": "Angle must be between 0 and 360 degrees"
}
File Storage
All uploaded files and processed data are stored in the /Model/ directory:
- Reference Videos:
/Model/my_scan.mp4(overwritten on each upload) - Reference Images:
/Model/ref_{filename} - Embeddings Cache:
/Model/embeddings_cache.pkl - Audio Recordings:
/Model/audio_recordings/audio_{session_id}_{timestamp}.wav - Audio Metadata:
/Model/audio_recordings/audio_{session_id}_{timestamp}_metadata.txt
Metadata Format
Audio metadata files contain timestamp and angle data in CSV format:
timestamp,angle
0.000,45.50
0.064,46.20
0.128,47.00
Usage Example (Python)
import requests
import websockets
import asyncio
# 1. Upload reference video
with open("my_360_scan.mp4", "rb") as f:
response = requests.post(
"http://localhost:8000/api/face/upload-video",
files={"file": f},
headers={"Authorization": f"Bearer {token}"}
)
print(response.json())
# 2. Start audio stream
response = requests.post(
"http://localhost:8000/api/audio/start-stream",
data={"sample_rate": 16000, "channels": 1},
headers={"Authorization": f"Bearer {token}"}
)
session_id = response.json()["session_id"]
# 3. Stream audio via WebSocket
async def stream_audio():
uri = f"ws://localhost:8000/ws/audio/{session_id}"
async with websockets.connect(uri) as websocket:
# Send audio chunk with angle
await websocket.send(json.dumps({
"audio_data": base64.b64encode(audio_bytes).decode(),
"angle": 45.5
}))
# Or send raw binary
await websocket.send(audio_bytes)
# Stop when done
await websocket.send(json.dumps({"command": "stop"}))
asyncio.run(stream_audio())
# 4. Get angle data for a session
response = requests.get(
f"http://localhost:8000/api/audio/angles/{session_id}",
headers={"Authorization": f"Bearer {token}"}
)
angles = response.json()["angles"]
print(f"Recorded {len(angles)} angle measurements")
# 5. Download recorded audio
response = requests.get(
f"http://localhost:8000/api/audio/download/{session_id}",
headers={"Authorization": f"Bearer {token}"}
)
with open("downloaded_audio.wav", "wb") as f:
f.write(response.content)
# 6. Send desired angle to backend
response = requests.post(
f"http://localhost:8000/api/audio/set-angle/{session_id}",
data={"angle": 90.0},
headers={"Authorization": f"Bearer {token}"}
)
print(response.json())