File size: 2,642 Bytes
e21b874 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
---
license: openrail++
base_model: stabilityai/stable-diffusion-xl-base-1.0
tags:
- stable-diffusion-xl
- controlnet
- temporal
- video
- diffusers
inference: true
---
# TemporalNet2 ControlNet for SDXL
This is a TemporalNet2 ControlNet model trained on SDXL (Stable Diffusion XL base 1.0).
## Model Description
TemporalNet2 is a ControlNet variant designed for temporal coherence in video generation. It takes two conditioning inputs:
- **Previous Frame**: The previous frame in the video sequence (3 channels)
- **Optical Flow**: The optical flow between the previous and current frame (3 channels)
Total conditioning channels: **6 channels**
This model was trained to generate temporally coherent frames by learning from both the visual content of the previous frame and the motion information encoded in optical flow.
## Usage
```python
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, EulerDiscreteScheduler
from PIL import Image
import torch
# Load the ControlNet model
controlnet = ControlNetModel.from_pretrained(
"YOUR_USERNAME/temporalnet2-sdxl-controlnet",
torch_dtype=torch.float16
)
# Create the pipeline
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
controlnet=controlnet,
torch_dtype=torch.float16
)
pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.to("cuda")
# Load your conditioning images
prev_frame = Image.open("previous_frame.jpg")
optical_flow = Image.open("optical_flow.jpg")
# Concatenate conditioning images (they will be concatenated in the pipeline)
# Note: You'll need to prepare the 6-channel input by concatenating prev_frame and optical_flow
prompt = "your prompt describing the scene"
# Generate
image = pipe(
prompt=prompt,
image=[prev_frame, optical_flow], # The pipeline will handle concatenation
num_inference_steps=20,
guidance_scale=7.5
).images[0]
image.save("output.jpg")
```
## Training Details
- **Base Model**: stabilityai/stable-diffusion-xl-base-1.0
- **Training Resolution**: Multi-resolution (512, 640, 768, 896, 1024px)
- **Conditioning Channels**: 6 (3 for previous frame + 3 for optical flow)
- **Training Steps**: 25,000
- **Mixed Precision**: bfloat16
## Limitations
This model requires specific conditioning inputs:
1. The previous frame from your video sequence
2. The optical flow computed between frames
For best results, ensure your optical flow visualization uses a consistent color scheme and magnitude representation.
## License
This model is released under the same license as SDXL (OpenRAIL++).
|