| --- |
| license: openrail++ |
| base_model: stabilityai/stable-diffusion-xl-base-1.0 |
| tags: |
| - stable-diffusion-xl |
| - controlnet |
| - temporal |
| - video |
| - diffusers |
| inference: true |
| --- |
| |
| # TemporalNet2 ControlNet for SDXL |
|
|
| This is a TemporalNet2 ControlNet model trained on SDXL (Stable Diffusion XL base 1.0). |
|
|
| ## Model Description |
|
|
| TemporalNet2 is a ControlNet variant designed for temporal coherence in video generation. It takes two conditioning inputs: |
| - **Previous Frame**: The previous frame in the video sequence (3 channels) |
| - **Optical Flow**: The optical flow between the previous and current frame (3 channels) |
|
|
| Total conditioning channels: **6 channels** |
|
|
| This model was trained to generate temporally coherent frames by learning from both the visual content of the previous frame and the motion information encoded in optical flow. |
|
|
| ## Usage |
|
|
| ```python |
| from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, EulerDiscreteScheduler |
| from PIL import Image |
| import torch |
| |
| # Load the ControlNet model |
| controlnet = ControlNetModel.from_pretrained( |
| "YOUR_USERNAME/temporalnet2-sdxl-controlnet", |
| torch_dtype=torch.float16 |
| ) |
| |
| # Create the pipeline |
| pipe = StableDiffusionXLControlNetPipeline.from_pretrained( |
| "stabilityai/stable-diffusion-xl-base-1.0", |
| controlnet=controlnet, |
| torch_dtype=torch.float16 |
| ) |
| pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config) |
| pipe.to("cuda") |
| |
| # Load your conditioning images |
| prev_frame = Image.open("previous_frame.jpg") |
| optical_flow = Image.open("optical_flow.jpg") |
| |
| # Concatenate conditioning images (they will be concatenated in the pipeline) |
| # Note: You'll need to prepare the 6-channel input by concatenating prev_frame and optical_flow |
| prompt = "your prompt describing the scene" |
| |
| # Generate |
| image = pipe( |
| prompt=prompt, |
| image=[prev_frame, optical_flow], # The pipeline will handle concatenation |
| num_inference_steps=20, |
| guidance_scale=7.5 |
| ).images[0] |
| |
| image.save("output.jpg") |
| ``` |
|
|
| ## Training Details |
|
|
| - **Base Model**: stabilityai/stable-diffusion-xl-base-1.0 |
| - **Training Resolution**: Multi-resolution (512, 640, 768, 896, 1024px) |
| - **Conditioning Channels**: 6 (3 for previous frame + 3 for optical flow) |
| - **Training Steps**: 25,000 |
| - **Mixed Precision**: bfloat16 |
|
|
| ## Limitations |
|
|
| This model requires specific conditioning inputs: |
| 1. The previous frame from your video sequence |
| 2. The optical flow computed between frames |
|
|
| For best results, ensure your optical flow visualization uses a consistent color scheme and magnitude representation. |
|
|
| ## License |
|
|
| This model is released under the same license as SDXL (OpenRAIL++). |
|
|