File size: 2,642 Bytes
e21b874
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
license: openrail++
base_model: stabilityai/stable-diffusion-xl-base-1.0
tags:
  - stable-diffusion-xl
  - controlnet
  - temporal
  - video
  - diffusers
inference: true
---

# TemporalNet2 ControlNet for SDXL

This is a TemporalNet2 ControlNet model trained on SDXL (Stable Diffusion XL base 1.0).

## Model Description

TemporalNet2 is a ControlNet variant designed for temporal coherence in video generation. It takes two conditioning inputs:
- **Previous Frame**: The previous frame in the video sequence (3 channels)
- **Optical Flow**: The optical flow between the previous and current frame (3 channels)

Total conditioning channels: **6 channels**

This model was trained to generate temporally coherent frames by learning from both the visual content of the previous frame and the motion information encoded in optical flow.

## Usage

```python
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, EulerDiscreteScheduler
from PIL import Image
import torch

# Load the ControlNet model
controlnet = ControlNetModel.from_pretrained(
    "YOUR_USERNAME/temporalnet2-sdxl-controlnet",
    torch_dtype=torch.float16
)

# Create the pipeline
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    torch_dtype=torch.float16
)
pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.to("cuda")

# Load your conditioning images
prev_frame = Image.open("previous_frame.jpg")
optical_flow = Image.open("optical_flow.jpg")

# Concatenate conditioning images (they will be concatenated in the pipeline)
# Note: You'll need to prepare the 6-channel input by concatenating prev_frame and optical_flow
prompt = "your prompt describing the scene"

# Generate
image = pipe(
    prompt=prompt,
    image=[prev_frame, optical_flow],  # The pipeline will handle concatenation
    num_inference_steps=20,
    guidance_scale=7.5
).images[0]

image.save("output.jpg")
```

## Training Details

- **Base Model**: stabilityai/stable-diffusion-xl-base-1.0
- **Training Resolution**: Multi-resolution (512, 640, 768, 896, 1024px)
- **Conditioning Channels**: 6 (3 for previous frame + 3 for optical flow)
- **Training Steps**: 25,000
- **Mixed Precision**: bfloat16

## Limitations

This model requires specific conditioning inputs:
1. The previous frame from your video sequence
2. The optical flow computed between frames

For best results, ensure your optical flow visualization uses a consistent color scheme and magnitude representation.

## License

This model is released under the same license as SDXL (OpenRAIL++).