--- license: openrail++ base_model: stabilityai/stable-diffusion-xl-base-1.0 tags: - stable-diffusion-xl - controlnet - temporal - video - diffusers inference: true --- # TemporalNet2 ControlNet for SDXL This is a TemporalNet2 ControlNet model trained on SDXL (Stable Diffusion XL base 1.0). ## Model Description TemporalNet2 is a ControlNet variant designed for temporal coherence in video generation. It takes two conditioning inputs: - **Previous Frame**: The previous frame in the video sequence (3 channels) - **Optical Flow**: The optical flow between the previous and current frame (3 channels) Total conditioning channels: **6 channels** This model was trained to generate temporally coherent frames by learning from both the visual content of the previous frame and the motion information encoded in optical flow. ## Usage ```python from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, EulerDiscreteScheduler from PIL import Image import torch # Load the ControlNet model controlnet = ControlNetModel.from_pretrained( "YOUR_USERNAME/temporalnet2-sdxl-controlnet", torch_dtype=torch.float16 ) # Create the pipeline pipe = StableDiffusionXLControlNetPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet, torch_dtype=torch.float16 ) pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config) pipe.to("cuda") # Load your conditioning images prev_frame = Image.open("previous_frame.jpg") optical_flow = Image.open("optical_flow.jpg") # Concatenate conditioning images (they will be concatenated in the pipeline) # Note: You'll need to prepare the 6-channel input by concatenating prev_frame and optical_flow prompt = "your prompt describing the scene" # Generate image = pipe( prompt=prompt, image=[prev_frame, optical_flow], # The pipeline will handle concatenation num_inference_steps=20, guidance_scale=7.5 ).images[0] image.save("output.jpg") ``` ## Training Details - **Base Model**: stabilityai/stable-diffusion-xl-base-1.0 - **Training Resolution**: Multi-resolution (512, 640, 768, 896, 1024px) - **Conditioning Channels**: 6 (3 for previous frame + 3 for optical flow) - **Training Steps**: 25,000 - **Mixed Precision**: bfloat16 ## Limitations This model requires specific conditioning inputs: 1. The previous frame from your video sequence 2. The optical flow computed between frames For best results, ensure your optical flow visualization uses a consistent color scheme and magnitude representation. ## License This model is released under the same license as SDXL (OpenRAIL++).