varb15
/

TemporalNet2-stable-diffusion-xl-base-1.0

stable-diffusion-xl

Model card Files Files and versions

TemporalNet2-stable-diffusion-xl-base-1.0 / README.md

varb15's picture

Upload TemporalNet2 ControlNet SDXL model

e21b874 verified about 2 months ago

|

history blame contribute delete

2.64 kB

	---
	license: openrail++
	base_model: stabilityai/stable-diffusion-xl-base-1.0
	tags:
	- stable-diffusion-xl
	- controlnet
	- temporal
	- video
	- diffusers
	inference: true
	---

	# TemporalNet2 ControlNet for SDXL

	This is a TemporalNet2 ControlNet model trained on SDXL (Stable Diffusion XL base 1.0).

	## Model Description

	TemporalNet2 is a ControlNet variant designed for temporal coherence in video generation. It takes two conditioning inputs:
	- Previous Frame: The previous frame in the video sequence (3 channels)
	- Optical Flow: The optical flow between the previous and current frame (3 channels)

	Total conditioning channels: 6 channels

	This model was trained to generate temporally coherent frames by learning from both the visual content of the previous frame and the motion information encoded in optical flow.

	## Usage

	```python
	from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, EulerDiscreteScheduler
	from PIL import Image
	import torch

	# Load the ControlNet model
	controlnet = ControlNetModel.from_pretrained(
	"YOUR_USERNAME/temporalnet2-sdxl-controlnet",
	torch_dtype=torch.float16
	)

	# Create the pipeline
	pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
	"stabilityai/stable-diffusion-xl-base-1.0",
	controlnet=controlnet,
	torch_dtype=torch.float16
	)
	pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config)
	pipe.to("cuda")

	# Load your conditioning images
	prev_frame = Image.open("previous_frame.jpg")
	optical_flow = Image.open("optical_flow.jpg")

	# Concatenate conditioning images (they will be concatenated in the pipeline)
	# Note: You'll need to prepare the 6-channel input by concatenating prev_frame and optical_flow
	prompt = "your prompt describing the scene"

	# Generate
	image = pipe(
	prompt=prompt,
	image=[prev_frame, optical_flow], # The pipeline will handle concatenation
	num_inference_steps=20,
	guidance_scale=7.5
	).images[0]

	image.save("output.jpg")
	```

	## Training Details

	- Base Model: stabilityai/stable-diffusion-xl-base-1.0
	- Training Resolution: Multi-resolution (512, 640, 768, 896, 1024px)
	- Conditioning Channels: 6 (3 for previous frame + 3 for optical flow)
	- Training Steps: 25,000
	- Mixed Precision: bfloat16

	## Limitations

	This model requires specific conditioning inputs:
	1. The previous frame from your video sequence
	2. The optical flow computed between frames

	For best results, ensure your optical flow visualization uses a consistent color scheme and magnitude representation.

	## License

	This model is released under the same license as SDXL (OpenRAIL++).