livepeer-dev commited on
Commit
852fdf8
·
verified ·
1 Parent(s): 2c69040

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: openrail++
3
+ base_model: stabilityai/stable-diffusion-xl-base-1.0
4
+ tags:
5
+ - stable-diffusion-xl
6
+ - controlnet
7
+ - temporal
8
+ - video
9
+ - diffusers
10
+ inference: true
11
+ ---
12
+
13
+ # TemporalNet2 ControlNet for SDXL
14
+
15
+ This is a TemporalNet2 ControlNet model trained on SDXL (Stable Diffusion XL base 1.0).
16
+
17
+ ## Model Description
18
+
19
+ TemporalNet2 is a ControlNet variant designed for temporal coherence in video generation. It takes two conditioning inputs:
20
+ - **Previous Frame**: The previous frame in the video sequence (3 channels)
21
+ - **Optical Flow**: The optical flow between the previous and current frame (3 channels)
22
+
23
+ Total conditioning channels: **6 channels**
24
+
25
+ This model was trained to generate temporally coherent frames by learning from both the visual content of the previous frame and the motion information encoded in optical flow.
26
+
27
+ ## Usage
28
+
29
+ ```python
30
+ from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, EulerDiscreteScheduler
31
+ from PIL import Image
32
+ import torch
33
+
34
+ # Load the ControlNet model
35
+ controlnet = ControlNetModel.from_pretrained(
36
+ "YOUR_USERNAME/temporalnet2-sdxl-controlnet",
37
+ torch_dtype=torch.float16
38
+ )
39
+
40
+ # Create the pipeline
41
+ pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
42
+ "stabilityai/stable-diffusion-xl-base-1.0",
43
+ controlnet=controlnet,
44
+ torch_dtype=torch.float16
45
+ )
46
+ pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config)
47
+ pipe.to("cuda")
48
+
49
+ # Load your conditioning images
50
+ prev_frame = Image.open("previous_frame.jpg")
51
+ optical_flow = Image.open("optical_flow.jpg")
52
+
53
+ # Concatenate conditioning images (they will be concatenated in the pipeline)
54
+ # Note: You'll need to prepare the 6-channel input by concatenating prev_frame and optical_flow
55
+ prompt = "your prompt describing the scene"
56
+
57
+ # Generate
58
+ image = pipe(
59
+ prompt=prompt,
60
+ image=[prev_frame, optical_flow], # The pipeline will handle concatenation
61
+ num_inference_steps=20,
62
+ guidance_scale=7.5
63
+ ).images[0]
64
+
65
+ image.save("output.jpg")
66
+ ```
67
+
68
+ ## Training Details
69
+
70
+ - **Base Model**: stabilityai/stable-diffusion-xl-base-1.0
71
+ - **Training Resolution**: Multi-resolution (512, 640, 768, 896, 1024px)
72
+ - **Conditioning Channels**: 6 (3 for previous frame + 3 for optical flow)
73
+ - **Training Steps**: 25,000
74
+ - **Mixed Precision**: bfloat16
75
+
76
+ ## Limitations
77
+
78
+ This model requires specific conditioning inputs:
79
+ 1. The previous frame from your video sequence
80
+ 2. The optical flow computed between frames
81
+
82
+ For best results, ensure your optical flow visualization uses a consistent color scheme and magnitude representation.
83
+
84
+ ## License
85
+
86
+ This model is released under the same license as SDXL (OpenRAIL++).
config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "ControlNetModel",
3
+ "_diffusers_version": "0.35.2",
4
+ "act_fn": "silu",
5
+ "addition_embed_type": "text_time",
6
+ "addition_embed_type_num_heads": 64,
7
+ "addition_time_embed_dim": 256,
8
+ "attention_head_dim": [
9
+ 5,
10
+ 10,
11
+ 20
12
+ ],
13
+ "block_out_channels": [
14
+ 320,
15
+ 640,
16
+ 1280
17
+ ],
18
+ "class_embed_type": null,
19
+ "conditioning_channels": 6,
20
+ "conditioning_embedding_out_channels": [
21
+ 16,
22
+ 32,
23
+ 96,
24
+ 256
25
+ ],
26
+ "controlnet_conditioning_channel_order": "rgb",
27
+ "cross_attention_dim": 2048,
28
+ "down_block_types": [
29
+ "DownBlock2D",
30
+ "CrossAttnDownBlock2D",
31
+ "CrossAttnDownBlock2D"
32
+ ],
33
+ "downsample_padding": 1,
34
+ "encoder_hid_dim": null,
35
+ "encoder_hid_dim_type": null,
36
+ "flip_sin_to_cos": true,
37
+ "freq_shift": 0,
38
+ "global_pool_conditions": false,
39
+ "in_channels": 4,
40
+ "layers_per_block": 2,
41
+ "mid_block_scale_factor": 1,
42
+ "mid_block_type": "UNetMidBlock2DCrossAttn",
43
+ "norm_eps": 1e-05,
44
+ "norm_num_groups": 32,
45
+ "num_attention_heads": null,
46
+ "num_class_embeds": null,
47
+ "only_cross_attention": false,
48
+ "projection_class_embeddings_input_dim": 2816,
49
+ "resnet_time_scale_shift": "default",
50
+ "transformer_layers_per_block": [
51
+ 1,
52
+ 2,
53
+ 10
54
+ ],
55
+ "upcast_attention": null,
56
+ "use_linear_projection": true
57
+ }
diffusion_pytorch_model-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4135665db5d337ae9d0d4bc7534c0ee848036c940e91eb324cec1afcd0e6a06c
3
+ size 4251097880
diffusion_pytorch_model-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a80245fd51004b5fac04055100eb5ee80d51cdbf06a72367a37331b318c47524
3
+ size 753071536
diffusion_pytorch_model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff