Spaces:
Runtime error
Runtime error
| # Video Captioning Endpoint (v1) | |
| ## 1. Overview | |
| The `/v1/video/caption` endpoint is part of the Video API and is responsible for adding captions to a video file. It accepts a video URL, caption text, and various styling options for the captions. The endpoint utilizes the `process_captioning_v1` service to generate a captioned video file, which is then uploaded to cloud storage, and the cloud URL is returned in the response. | |
| ## 2. Endpoint | |
| **URL:** `/v1/video/caption` | |
| **Method:** `POST` | |
| ## 3. Request | |
| ### Headers | |
| - `x-api-key`: Required. The API key for authentication. | |
| ### Body Parameters | |
| The request body must be a JSON object with the following properties: | |
| - `video_url` (string, required): The URL of the video file to be captioned. | |
| - `captions` (string, optional): Can be one of the following: | |
| - Raw caption text to be added to the video | |
| - URL to an SRT subtitle file | |
| - URL to an ASS subtitle file | |
| - If not provided, the system will automatically generate captions by transcribing the audio from the video | |
| - `settings` (object, optional): An object containing various styling options for the captions. See the schema below for available options. | |
| - `replace` (array, optional): An array of objects with `find` and `replace` properties, specifying text replacements to be made in the captions. | |
| - `webhook_url` (string, optional): A URL to receive a webhook notification when the captioning process is complete. | |
| - `id` (string, optional): An identifier for the request. | |
| - `language` (string, optional): The language code for the captions (e.g., "en", "fr"). Defaults to "auto". | |
| - `exclude_time_ranges` (array, optional): List of time ranges to skip when adding captions. Each item must be an object with: | |
| - `start`: (string, required) The start time of the excluded range, as a string timecode in `hh:mm:ss.ms` format (e.g., `00:01:23.456`). | |
| - `end`: (string, required) The end time, as a string timecode in `hh:mm:ss.ms` format, which must be strictly greater than `start`. | |
| If either value is not a valid timecode string, or if `end` is not greater than `start`, the request will return an error. | |
| #### Settings Schema | |
| ```json | |
| { | |
| "type": "object", | |
| "properties": { | |
| "line_color": {"type": "string"}, | |
| "word_color": {"type": "string"}, | |
| "outline_color": {"type": "string"}, | |
| "all_caps": {"type": "boolean"}, | |
| "max_words_per_line": {"type": "integer"}, | |
| "x": {"type": "integer"}, | |
| "y": {"type": "integer"}, | |
| "position": { | |
| "type": "string", | |
| "enum": [ | |
| "bottom_left", "bottom_center", "bottom_right", | |
| "middle_left", "middle_center", "middle_right", | |
| "top_left", "top_center", "top_right" | |
| ] | |
| }, | |
| "alignment": { | |
| "type": "string", | |
| "enum": ["left", "center", "right"] | |
| }, | |
| "font_family": {"type": "string"}, | |
| "font_size": {"type": "integer"}, | |
| "bold": {"type": "boolean"}, | |
| "italic": {"type": "boolean"}, | |
| "underline": {"type": "boolean"}, | |
| "strikeout": {"type": "boolean"}, | |
| "style": { | |
| "type": "string", | |
| "enum": [ | |
| "classic", // Regular captioning with all text displayed at once | |
| "karaoke", // Highlights words sequentially in a karaoke style | |
| "highlight", // Shows full text but highlights the current word | |
| "underline", // Shows full text but underlines the current word | |
| "word_by_word" // Shows one word at a time | |
| ] | |
| }, | |
| "outline_width": {"type": "integer"}, | |
| "spacing": {"type": "integer"}, | |
| "angle": {"type": "integer"}, | |
| "shadow_offset": {"type": "integer"} | |
| }, | |
| "additionalProperties": false | |
| } | |
| ``` | |
| ### Example Requests | |
| #### Example 1: Basic Automatic Captioning | |
| ```json | |
| { | |
| "video_url": "https://example.com/video.mp4" | |
| } | |
| ``` | |
| This minimal request will automatically transcribe the video and add white captions at the bottom center. | |
| #### Example 2: Custom Text with Styling | |
| ```json | |
| { | |
| "video_url": "https://example.com/video.mp4", | |
| "captions": "This is a sample caption text.", | |
| "settings": { | |
| "style": "classic", | |
| "line_color": "#FFFFFF", | |
| "outline_color": "#000000", | |
| "position": "bottom_center", | |
| "alignment": "center", | |
| "font_family": "Arial", | |
| "font_size": 24, | |
| "bold": true | |
| } | |
| } | |
| ``` | |
| #### Example 3: Karaoke-Style Captions with Advanced Options | |
| ```json | |
| { | |
| "video_url": "https://example.com/video.mp4", | |
| "settings": { | |
| "line_color": "#FFFFFF", | |
| "word_color": "#FFFF00", | |
| "outline_color": "#000000", | |
| "all_caps": false, | |
| "max_words_per_line": 10, | |
| "position": "bottom_center", | |
| "alignment": "center", | |
| "font_family": "Arial", | |
| "font_size": 24, | |
| "bold": false, | |
| "italic": false, | |
| "style": "karaoke", | |
| "outline_width": 2, | |
| "shadow_offset": 2 | |
| }, | |
| "replace": [ | |
| { | |
| "find": "um", | |
| "replace": "" | |
| }, | |
| { | |
| "find": "like", | |
| "replace": "" | |
| } | |
| ], | |
| "webhook_url": "https://example.com/webhook", | |
| "id": "request-123", | |
| "language": "en" | |
| } | |
| ``` | |
| #### Example 4: Using an External Subtitle File | |
| ```json | |
| { | |
| "video_url": "https://example.com/video.mp4", | |
| "captions": "https://example.com/subtitles.srt", | |
| "settings": { | |
| "line_color": "#FFFFFF", | |
| "outline_color": "#000000", | |
| "position": "bottom_center", | |
| "font_family": "Arial", | |
| "font_size": 24 | |
| } | |
| } | |
| ``` | |
| #### Example 5: Excluding Time Ranges from Captioning | |
| ```json | |
| { | |
| "video_url": "https://example.com/video.mp4", | |
| "settings": { | |
| "style": "classic", | |
| "line_color": "#FFFFFF", | |
| "outline_color": "#000000", | |
| "position": "bottom_center", | |
| "font_family": "Arial", | |
| "font_size": 24 | |
| }, | |
| "exclude_time_ranges": [ | |
| { "start": "00:00:10.000", "end": "00:00:20.000" }, | |
| { "start": "00:00:30.000", "end": "00:00:40.000" } | |
| ] | |
| } | |
| ``` | |
| ```bash | |
| curl -X POST \ | |
| -H "x-api-key: YOUR_API_KEY" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "video_url": "https://example.com/video.mp4", | |
| "settings": { | |
| "line_color": "#FFFFFF", | |
| "word_color": "#FFFF00", | |
| "outline_color": "#000000", | |
| "all_caps": false, | |
| "max_words_per_line": 10, | |
| "position": "bottom_center", | |
| "alignment": "center", | |
| "font_family": "Arial", | |
| "font_size": 24, | |
| "style": "karaoke", | |
| "outline_width": 2 | |
| }, | |
| "replace": [ | |
| { | |
| "find": "um", | |
| "replace": "" | |
| } | |
| ], | |
| "id": "custom-request-id" | |
| }' \ | |
| https://your-api-endpoint.com/v1/video/caption | |
| ``` | |
| ## 4. Response | |
| ### Success Response | |
| The response will be a JSON object with the following properties: | |
| - `code` (integer): The HTTP status code (200 for success). | |
| - `id` (string): The request identifier, if provided in the request. | |
| - `job_id` (string): A unique identifier for the job. | |
| - `response` (string): The cloud URL of the captioned video file. | |
| - `message` (string): A success message. | |
| - `pid` (integer): The process ID of the worker that processed the request. | |
| - `queue_id` (integer): The ID of the queue used for processing the request. | |
| - `run_time` (float): The time taken to process the request (in seconds). | |
| - `queue_time` (float): The time the request spent in the queue (in seconds). | |
| - `total_time` (float): The total time taken for the request (in seconds). | |
| - `queue_length` (integer): The current length of the processing queue. | |
| - `build_number` (string): The build number of the application. | |
| Example: | |
| ```json | |
| { | |
| "code": 200, | |
| "id": "request-123", | |
| "job_id": "d290f1ee-6c54-4b01-90e6-d701748f0851", | |
| "response": "https://cloud.example.com/captioned-video.mp4", | |
| "message": "success", | |
| "pid": 12345, | |
| "queue_id": 140682639937472, | |
| "run_time": 5.234, | |
| "queue_time": 0.012, | |
| "total_time": 5.246, | |
| "queue_length": 0, | |
| "build_number": "1.0.0" | |
| } | |
| ``` | |
| ### Error Responses | |
| #### Missing or Invalid Parameters | |
| **Status Code:** 400 Bad Request | |
| ```json | |
| { | |
| "code": 400, | |
| "id": "request-123", | |
| "job_id": "d290f1ee-6c54-4b01-90e6-d701748f0851", | |
| "message": "Missing or invalid parameters", | |
| "pid": 12345, | |
| "queue_id": 140682639937472, | |
| "queue_length": 0, | |
| "build_number": "1.0.0" | |
| } | |
| ``` | |
| #### Font Error | |
| **Status Code:** 400 Bad Request | |
| ```json | |
| { | |
| "code": 400, | |
| "error": "The requested font 'InvalidFont' is not available. Please choose from the available fonts.", | |
| "available_fonts": ["Arial", "Times New Roman", "Courier New", ...], | |
| "pid": 12345, | |
| "queue_id": 140682639937472, | |
| "queue_length": 0, | |
| "build_number": "1.0.0" | |
| } | |
| ``` | |
| #### Internal Server Error | |
| **Status Code:** 500 Internal Server Error | |
| ```json | |
| { | |
| "code": 500, | |
| "id": "request-123", | |
| "job_id": "d290f1ee-6c54-4b01-90e6-d701748f0851", | |
| "error": "An unexpected error occurred during the captioning process.", | |
| "pid": 12345, | |
| "queue_id": 140682639937472, | |
| "queue_length": 0, | |
| "build_number": "1.0.0" | |
| } | |
| ``` | |
| ## 5. Error Handling | |
| The endpoint handles the following common errors: | |
| - **Missing or Invalid Parameters**: If any required parameters are missing or invalid, a 400 Bad Request error is returned with a descriptive error message. | |
| - **Font Error**: If the requested font is not available, a 400 Bad Request error is returned with a list of available fonts. | |
| - **Internal Server Error**: If an unexpected error occurs during the captioning process, a 500 Internal Server Error is returned with an error message. | |
| Additionally, the main application context (`app.py`) includes error handling for queue overload. If the maximum queue length (`MAX_QUEUE_LENGTH`) is set and the queue size reaches that limit, a 429 Too Many Requests error is returned with a descriptive message. | |
| ## 6. Usage Notes | |
| - The `video_url` parameter must be a valid URL pointing to a video file (MP4, MOV, etc.). | |
| - The `captions` parameter is optional and can be used in multiple ways: | |
| - If not provided, the endpoint will automatically transcribe the audio and generate captions | |
| - If provided as plain text, the text will be used as captions for the entire video | |
| - If provided as a URL to an SRT or ASS subtitle file, the system will use that file for captioning | |
| - For SRT files, only 'classic' style is supported | |
| - For ASS files, the original styling will be preserved | |
| - The `settings` parameter allows for customization of the caption appearance and behavior: | |
| - `style` determines how captions are displayed, with options including: | |
| - `classic`: Regular captioning with all text displayed at once | |
| - `karaoke`: Highlights words sequentially in a karaoke style as they're spoken | |
| - `highlight`: Shows the full caption text but highlights each word as it's spoken | |
| - `underline`: Shows the full caption text but underlines each word as it's spoken | |
| - `word_by_word`: Shows only one word at a time | |
| - `position` can be used to place captions in one of nine positions on the screen | |
| - `alignment` determines text alignment within the position (left, center, right) | |
| - `font_family` can be any available system font | |
| - Color options can be set using hex codes (e.g., "#FFFFFF" for white) | |
| - The `replace` parameter can be used to perform text replacements in the captions (useful for correcting words or censoring content). | |
| - The `webhook_url` parameter is optional and can be used to receive a notification when the captioning process is complete. | |
| - The `id` parameter is optional and can be used to identify the request in webhook responses. | |
| - The `language` parameter is optional and can be used to specify the language of the captions for transcription. If not provided, the language will be automatically detected. | |
| - The `exclude_time_ranges` parameter can be used to specify time ranges to be excluded from captioning. | |
| ## 7. Common Issues | |
| - Providing an invalid or inaccessible `video_url`. | |
| - Requesting an unavailable font in the `settings` object. | |
| - Exceeding the maximum queue length, resulting in a 429 Too Many Requests error. | |
| ## 8. Best Practices | |
| - Validate the `video_url` parameter before sending the request to ensure it points to a valid and accessible video file. | |
| - Use the `webhook_url` parameter to receive notifications about the captioning process, rather than polling the API for updates. | |
| - Provide descriptive and meaningful `id` values to easily identify requests in logs and responses. | |
| - Use the `replace` parameter judiciously to avoid unintended text replacements in the captions. | |
| - Consider caching the captioned video files for frequently requested videos to improve performance and reduce processing time. |