# API Documentation ## Overview This document describes the new API endpoints added to the AFS backend for face recognition and audio streaming. ## Face Recognition APIs ### 1. Upload 360-Degree Reference Video **Endpoint:** `POST /api/face/upload-video` **Description:** Upload a 360-degree reference video for face recognition training. The video will be processed to extract high-quality face embeddings. **Authentication:** Required (JWT token) **Request:** - Content-Type: `multipart/form-data` - Body: `file` (video file - .mp4, .avi, .mov, .mkv) **Response:** ```json { "ok": true, "message": "Video processed successfully", "frames_used": 10, "embeddings_count": 1 } ``` ### 2. Upload Reference Image **Endpoint:** `POST /api/face/upload-image` **Description:** Upload a single reference image for face recognition. **Authentication:** Required (JWT token) **Request:** - Content-Type: `multipart/form-data` - Body: `file` (image file - .jpg, .jpeg, .png) **Response:** ```json { "ok": true, "message": "Image processed successfully", "embeddings_count": 1, "saved_path": "/path/to/Model/ref_image.jpg" } ``` ### 3. Get Cache Status **Endpoint:** `GET /api/face/cache-status` **Description:** Check if face recognition embeddings are cached and ready to use. **Authentication:** Required (JWT token) **Response (Cached):** ```json { "ok": true, "cached": true, "video_path": "my_scan.mp4", "model_name": "ArcFace", "num_frames_used": 10, "version": 2 } ``` **Response (Not Cached):** ```json { "ok": true, "cached": false, "message": "No cache found. Please upload a reference video or image." } ``` ## Audio Streaming APIs ### 1. Start Audio Stream **Endpoint:** `POST /api/audio/start-stream` **Description:** Start a new audio recording session. Returns a session ID for streaming. **Authentication:** Required (JWT token) **Request:** - Content-Type: `multipart/form-data` - Body: - `sample_rate` (optional, default: 16000) - `channels` (optional, default: 1 for mono, 2 for stereo) **Response:** ```json { "ok": true, "session_id": "uuid-here", "filename": "/path/to/Model/audio_recordings/audio_uuid_timestamp.wav", "sample_rate": 16000, "channels": 1 } ``` ### 2. Audio WebSocket Stream **Endpoint:** `WebSocket /ws/audio/{session_id}` **Description:** WebSocket endpoint for streaming audio data with optional angle information. **Authentication:** Not required at WebSocket level (use session_id from start-stream) **Send (Binary Audio Data):** ``` WebSocket Binary Message: raw audio bytes (16-bit PCM) ``` **Send (JSON with Angle):** ```json { "audio_data": "base64-encoded-audio-bytes", "angle": 45.5 } ``` **Send (Stop Command):** ```json { "command": "stop" } ``` **Receive:** ```json { "status": "received", "bytes": 1024 } ``` or ```json { "status": "received", "angle": 45.5 } ``` ### 3. Stop Audio Stream **Endpoint:** `POST /api/audio/stop-stream/{session_id}` **Description:** Stop an active audio recording stream. **Authentication:** Required (JWT token) **Response:** ```json { "ok": true, "message": "Audio stream stopped successfully" } ``` ### 4. List Audio Recordings **Endpoint:** `GET /api/audio/recordings` **Description:** Get a list of all audio recordings. **Authentication:** Required (JWT token) **Response:** ```json { "ok": true, "recordings": [ "/path/to/Model/audio_recordings/audio_uuid1_timestamp1.wav", "/path/to/Model/audio_recordings/audio_uuid2_timestamp2.wav" ], "count": 2 } ``` ### 5. Get Angle Metadata for Session **Endpoint:** `GET /api/audio/angles/{session_id}` **Description:** Retrieve angle data collected during an audio streaming session. **Authentication:** Required (JWT token) **Parameters:** - `session_id` (path parameter): The UUID of the audio session **Response:** ```json { "ok": true, "session_id": "uuid-here", "angles": [ {"timestamp": 0.000, "angle": 45.50}, {"timestamp": 0.064, "angle": 46.20}, {"timestamp": 0.128, "angle": 47.00} ], "count": 3 } ``` ### 6. Download Audio File **Endpoint:** `GET /api/audio/download/{session_id}` **Description:** Download the recorded audio file (.wav) for a specific session. **Authentication:** Required (JWT token) **Parameters:** - `session_id` (path parameter): The UUID of the audio session **Response:** - Binary WAV file with `Content-Type: audio/wav` - File download with appropriate filename header ### 7. Set Desired Angle **Endpoint:** `POST /api/audio/set-angle/{session_id}` **Description:** Send a desired/target angle to the audio processing backend for a session. **Authentication:** Required (JWT token) **Parameters:** - `session_id` (path parameter): The UUID of the audio session - `angle` (form parameter, required): Desired angle in degrees (0-360) **Request:** ``` POST /api/audio/set-angle/session-uuid-here Content-Type: multipart/form-data angle=45.5 ``` **Response:** ```json { "ok": true, "message": "Desired angle set to 45.5°", "session_id": "uuid-here", "angle": 45.5 } ``` **Error Response (Invalid Angle):** ```json { "detail": "Angle must be between 0 and 360 degrees" } ``` ## File Storage All uploaded files and processed data are stored in the `/Model/` directory: - **Reference Videos:** `/Model/my_scan.mp4` (overwritten on each upload) - **Reference Images:** `/Model/ref_{filename}` - **Embeddings Cache:** `/Model/embeddings_cache.pkl` - **Audio Recordings:** `/Model/audio_recordings/audio_{session_id}_{timestamp}.wav` - **Audio Metadata:** `/Model/audio_recordings/audio_{session_id}_{timestamp}_metadata.txt` ## Metadata Format Audio metadata files contain timestamp and angle data in CSV format: ``` timestamp,angle 0.000,45.50 0.064,46.20 0.128,47.00 ``` ## Usage Example (Python) ```python import requests import websockets import asyncio # 1. Upload reference video with open("my_360_scan.mp4", "rb") as f: response = requests.post( "http://localhost:8000/api/face/upload-video", files={"file": f}, headers={"Authorization": f"Bearer {token}"} ) print(response.json()) # 2. Start audio stream response = requests.post( "http://localhost:8000/api/audio/start-stream", data={"sample_rate": 16000, "channels": 1}, headers={"Authorization": f"Bearer {token}"} ) session_id = response.json()["session_id"] # 3. Stream audio via WebSocket async def stream_audio(): uri = f"ws://localhost:8000/ws/audio/{session_id}" async with websockets.connect(uri) as websocket: # Send audio chunk with angle await websocket.send(json.dumps({ "audio_data": base64.b64encode(audio_bytes).decode(), "angle": 45.5 })) # Or send raw binary await websocket.send(audio_bytes) # Stop when done await websocket.send(json.dumps({"command": "stop"})) asyncio.run(stream_audio()) # 4. Get angle data for a session response = requests.get( f"http://localhost:8000/api/audio/angles/{session_id}", headers={"Authorization": f"Bearer {token}"} ) angles = response.json()["angles"] print(f"Recorded {len(angles)} angle measurements") # 5. Download recorded audio response = requests.get( f"http://localhost:8000/api/audio/download/{session_id}", headers={"Authorization": f"Bearer {token}"} ) with open("downloaded_audio.wav", "wb") as f: f.write(response.content) # 6. Send desired angle to backend response = requests.post( f"http://localhost:8000/api/audio/set-angle/{session_id}", data={"angle": 90.0}, headers={"Authorization": f"Bearer {token}"} ) print(response.json()) ```