LTX-Video / README-API.md

Upload 2 files

d37ebe0 verified about 1 month ago

6.09 kB

	# Video Generation Endpoint API (Custom Handler)

	This repository is configured for deployment as a Hugging Face Inference Endpoint using a custom `handler.py`. The endpoint generates a short video from a text prompt and can return the result as:

	- GIF (preview-friendly)
	- WebM (higher quality, better compression)
	- ZIP of PNG frames (maximum control / post-processing)

	---

	## Endpoint URL

	After deployment, your endpoint will look like:

	```
	https://<your-endpoint>.aws.endpoints.huggingface.cloud
	```

	Example:

	```
	https://cyjm1rsdzy6la31w.us-east-1.aws.endpoints.huggingface.cloud
	```

	---

	## Authentication

	All requests require a Hugging Face token with permission to call the endpoint.

	Send it as a Bearer token:

	```
	Authorization: Bearer YOUR_HF_TOKEN
	```

	---

	## Request Format

	Hugging Face endpoint requests should be wrapped in a top-level `inputs` object:

	```json
	{
	"inputs": {
	"prompt": "cinematic sunset over mountains",
	"outputs": ["gif"]
	}
	}
	```

	### Core Fields

	\| Field \| Type \| Default \| Description \|
	\|------\|------\|---------\|-------------\|
	\| `prompt` \| string \| required \| Text prompt describing the video. \|
	\| `negative_prompt` \| string \| `""` \| Things you want to avoid. \|
	\| `num_frames` \| int \| `32` \| Number of frames to generate. \|
	\| `fps` \| int \| `12` \| Playback FPS for GIF/WebM (may be overridden per output). \|
	\| `height` \| int \| `512` \| Frame height. \|
	\| `width` \| int \| `512` \| Frame width. \|
	\| `seed` \| int \| `null` \| Seed for reproducibility. \|
	\| `outputs` \| array \| `["gif"]` \| Any subset: `["gif","webm","zip"]`. \|
	\| `return_base64` \| bool \| `true` \| If true, returns file contents as base64 strings. \|
	\| `num_inference_steps` \| int \| `30` \| More steps can improve quality but increases latency. \|
	\| `guidance_scale` \| float \| `7.5` \| Prompt adherence strength (higher = more literal). \|

	---

	## Output Configuration

	You can optionally include per-output options inside `inputs`.

	### GIF options

	```json
	"gif": { "fps": 10 }
	```

	### WebM options

	```json
	"webm": { "fps": 24, "quality": "good" }
	```

	Quality values:

	- `"fast"` — fastest encode
	- `"good"` — balanced (recommended)
	- `"best"` — higher quality, slower encode

	### ZIP output

	ZIP output contains PNG frames:

	```
	frame_000000.png
	frame_000001.png
	...
	```

	---

	## Response Format

	The handler returns JSON. On success:

	```json
	{
	"ok": true,
	"outputs": {
	"gif_base64": "...",
	"webm_base64": "...",
	"zip_base64": "..."
	},
	"diagnostics": {
	"timing_ms": { ... },
	"generator": { ... }
	}
	}
	```

	On error:

	```json
	{
	"ok": false,
	"error": "human readable error message",
	"diagnostics": { ... }
	}
	```

	---

	## Example curl Commands (Direct-to-file)

	These examples download only the file (decoded from base64 in the JSON response) without saving the JSON to disk.

	> Important: We use `jq -er` so the command fails if the output key is missing. This prevents writing corrupted files when the API returns an error.

	Replace `YOUR_HF_TOKEN` and your endpoint URL as needed.

	---

	### 1) GIF → `output.gif`

	```bash
	curl -sS -X POST "https://cyjm1rsdzy6la31w.us-east-1.aws.endpoints.huggingface.cloud" \
	-H "Authorization: Bearer YOUR_HF_TOKEN" \
	-H "Content-Type: application/json" \
	-d '{
	"inputs": {
	"prompt": "cinematic sunset over mountains, slow pan",
	"num_frames": 20,
	"fps": 10,
	"outputs": ["gif"]
	}
	}' \
	\| jq -er '.outputs.gif_base64' \
	\| base64 --decode > output.gif
	```

	---

	### 2) WebM → `output.webm`

	```bash
	curl -sS -X POST "https://cyjm1rsdzy6la31w.us-east-1.aws.endpoints.huggingface.cloud" \
	-H "Authorization: Bearer YOUR_HF_TOKEN" \
	-H "Content-Type: application/json" \
	-d '{
	"inputs": {
	"prompt": "a drone flying through clouds, volumetric lighting",
	"num_frames": 32,
	"fps": 24,
	"outputs": ["webm"],
	"webm": { "quality": "good" }
	}
	}' \
	\| jq -er '.outputs.webm_base64' \
	\| base64 --decode > output.webm
	```

	---

	### 3) ZIP (frames) → `frames.zip`

	```bash
	curl -sS -X POST "https://cyjm1rsdzy6la31w.us-east-1.aws.endpoints.huggingface.cloud" \
	-H "Authorization: Bearer YOUR_HF_TOKEN" \
	-H "Content-Type: application/json" \
	-d '{
	"inputs": {
	"prompt": "ocean waves crashing in slow motion",
	"num_frames": 16,
	"outputs": ["zip"]
	}
	}' \
	\| jq -er '.outputs.zip_base64' \
	\| base64 --decode > frames.zip
	```

	Unzip frames:

	```bash
	unzip frames.zip
	```

	---

	### 4) Multi-output (GIF + WebM + ZIP)

	```bash
	curl -sS -X POST "https://cyjm1rsdzy6la31w.us-east-1.aws.endpoints.huggingface.cloud" \
	-H "Authorization: Bearer YOUR_HF_TOKEN" \
	-H "Content-Type: application/json" \
	-d '{
	"inputs": {
	"prompt": "epic cinematic space nebula, slow parallax motion",
	"num_frames": 24,
	"fps": 12,
	"outputs": ["gif", "webm", "zip"],
	"gif": { "fps": 10 },
	"webm": { "fps": 24, "quality": "good" }
	}
	}' \
	-o response.json
	```

	Extract:

	```bash
	jq -er '.outputs.gif_base64' response.json \| base64 --decode > output.gif
	jq -er '.outputs.webm_base64' response.json \| base64 --decode > output.webm
	jq -er '.outputs.zip_base64' response.json \| base64 --decode > frames.zip
	```

	---

	## Troubleshooting

	### “Corrupted” output files

	Inspect the JSON first:

	```bash
	jq . response.json
	```

	Ensure:

	```
	"ok": true
	```

	### Large outputs

	Reduce:

	- `num_frames`
	- `height` / `width`

	Or modify the handler to upload to cloud storage and return a download URL.

	---

	## Repository Notes

	This repo is designed for Hugging Face Inference Endpoints with a custom handler.

	Key files:

	- `handler.py` — request parsing, model invocation, output encoding
	- `requirements.txt` — Python dependencies

	If your model lives in a subdirectory, set the environment variable:

	```
	HF_MODEL_SUBDIR
	```

	---

	## Security Notes

	- Do not commit secrets or tokens into this repository.
	- Use Endpoint Secrets / Environment Variables for credentials.

	---

	## License

	Specify your license here (e.g., MIT, Apache-2.0).