|
|
--- |
|
|
license: openrail++ |
|
|
base_model: stabilityai/stable-diffusion-xl-base-1.0 |
|
|
tags: |
|
|
- stable-diffusion-xl |
|
|
- controlnet |
|
|
- temporal |
|
|
- video |
|
|
- diffusers |
|
|
inference: true |
|
|
--- |
|
|
|
|
|
# TemporalNet2 ControlNet for SDXL |
|
|
|
|
|
This is a TemporalNet2 ControlNet model trained on SDXL (Stable Diffusion XL base 1.0). |
|
|
|
|
|
## Model Description |
|
|
|
|
|
TemporalNet2 is a ControlNet variant designed for temporal coherence in video generation. It takes two conditioning inputs: |
|
|
- **Previous Frame**: The previous frame in the video sequence (3 channels) |
|
|
- **Optical Flow**: The optical flow between the previous and current frame (3 channels) |
|
|
|
|
|
Total conditioning channels: **6 channels** |
|
|
|
|
|
This model was trained to generate temporally coherent frames by learning from both the visual content of the previous frame and the motion information encoded in optical flow. |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, EulerDiscreteScheduler |
|
|
from PIL import Image |
|
|
import torch |
|
|
|
|
|
# Load the ControlNet model |
|
|
controlnet = ControlNetModel.from_pretrained( |
|
|
"YOUR_USERNAME/temporalnet2-sdxl-controlnet", |
|
|
torch_dtype=torch.float16 |
|
|
) |
|
|
|
|
|
# Create the pipeline |
|
|
pipe = StableDiffusionXLControlNetPipeline.from_pretrained( |
|
|
"stabilityai/stable-diffusion-xl-base-1.0", |
|
|
controlnet=controlnet, |
|
|
torch_dtype=torch.float16 |
|
|
) |
|
|
pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config) |
|
|
pipe.to("cuda") |
|
|
|
|
|
# Load your conditioning images |
|
|
prev_frame = Image.open("previous_frame.jpg") |
|
|
optical_flow = Image.open("optical_flow.jpg") |
|
|
|
|
|
# Concatenate conditioning images (they will be concatenated in the pipeline) |
|
|
# Note: You'll need to prepare the 6-channel input by concatenating prev_frame and optical_flow |
|
|
prompt = "your prompt describing the scene" |
|
|
|
|
|
# Generate |
|
|
image = pipe( |
|
|
prompt=prompt, |
|
|
image=[prev_frame, optical_flow], # The pipeline will handle concatenation |
|
|
num_inference_steps=20, |
|
|
guidance_scale=7.5 |
|
|
).images[0] |
|
|
|
|
|
image.save("output.jpg") |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Base Model**: stabilityai/stable-diffusion-xl-base-1.0 |
|
|
- **Training Resolution**: Multi-resolution (512, 640, 768, 896, 1024px) |
|
|
- **Conditioning Channels**: 6 (3 for previous frame + 3 for optical flow) |
|
|
- **Training Steps**: 25,000 |
|
|
- **Mixed Precision**: bfloat16 |
|
|
|
|
|
## Limitations |
|
|
|
|
|
This model requires specific conditioning inputs: |
|
|
1. The previous frame from your video sequence |
|
|
2. The optical flow computed between frames |
|
|
|
|
|
For best results, ensure your optical flow visualization uses a consistent color scheme and magnitude representation. |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the same license as SDXL (OpenRAIL++). |
|
|
|