| | --- |
| | pipeline_tag: text-to-video |
| | license: other |
| | license_link: LICENSE |
| | --- |
| | |
| | # TrackDiffusion Model Card |
| |
|
| | <!-- Provide a quick summary of what the model is/does. --> |
| | TrackDiffusion is a diffusion model that takes in tracklets as conditions, and generates a video from it. |
| |  |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | TrackDiffusion is a novel video generation framework that enables fine-grained control over complex dynamics in video synthesis by conditioning the generation process on object trajectories. |
| | This approach allows for precise manipulation of object trajectories and interactions, addressing the challenges of managing appearance, disappearance, scale changes, and ensuring consistency across frames. |
| | ## Uses |
| |
|
| | ### Direct Use |
| |
|
| | We provide the weights for the entire unet, so you can replace it in diffusers pipeline, for example: |
| |
|
| | ```python |
| | pretrained_model_path = "stabilityai/stable-video-diffusion-img2vid" |
| | unet = UNetSpatioTemporalConditionModel.from_pretrained("/path/to/unet", torch_dtype=torch.float16,) |
| | pipe = StableVideoDiffusionPipeline.from_pretrained( |
| | pretrained_model_path, |
| | unet=unet, |
| | torch_dtype=torch.float16, |
| | variant="fp16", |
| | low_cpu_mem_usage=True) |
| | ``` |
| |
|