varb15's picture
Upload TemporalNet2 ControlNet SDXL model
e21b874 verified
---
license: openrail++
base_model: stabilityai/stable-diffusion-xl-base-1.0
tags:
- stable-diffusion-xl
- controlnet
- temporal
- video
- diffusers
inference: true
---
# TemporalNet2 ControlNet for SDXL
This is a TemporalNet2 ControlNet model trained on SDXL (Stable Diffusion XL base 1.0).
## Model Description
TemporalNet2 is a ControlNet variant designed for temporal coherence in video generation. It takes two conditioning inputs:
- **Previous Frame**: The previous frame in the video sequence (3 channels)
- **Optical Flow**: The optical flow between the previous and current frame (3 channels)
Total conditioning channels: **6 channels**
This model was trained to generate temporally coherent frames by learning from both the visual content of the previous frame and the motion information encoded in optical flow.
## Usage
```python
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, EulerDiscreteScheduler
from PIL import Image
import torch
# Load the ControlNet model
controlnet = ControlNetModel.from_pretrained(
"YOUR_USERNAME/temporalnet2-sdxl-controlnet",
torch_dtype=torch.float16
)
# Create the pipeline
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
controlnet=controlnet,
torch_dtype=torch.float16
)
pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.to("cuda")
# Load your conditioning images
prev_frame = Image.open("previous_frame.jpg")
optical_flow = Image.open("optical_flow.jpg")
# Concatenate conditioning images (they will be concatenated in the pipeline)
# Note: You'll need to prepare the 6-channel input by concatenating prev_frame and optical_flow
prompt = "your prompt describing the scene"
# Generate
image = pipe(
prompt=prompt,
image=[prev_frame, optical_flow], # The pipeline will handle concatenation
num_inference_steps=20,
guidance_scale=7.5
).images[0]
image.save("output.jpg")
```
## Training Details
- **Base Model**: stabilityai/stable-diffusion-xl-base-1.0
- **Training Resolution**: Multi-resolution (512, 640, 768, 896, 1024px)
- **Conditioning Channels**: 6 (3 for previous frame + 3 for optical flow)
- **Training Steps**: 25,000
- **Mixed Precision**: bfloat16
## Limitations
This model requires specific conditioning inputs:
1. The previous frame from your video sequence
2. The optical flow computed between frames
For best results, ensure your optical flow visualization uses a consistent color scheme and magnitude representation.
## License
This model is released under the same license as SDXL (OpenRAIL++).