Instructions to use otnl/dreamlite-stream-temporal-lllite-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use otnl/dreamlite-stream-temporal-lllite-v3 with Diffusers:
pip install -U diffusers transformers accelerate
from diffusers import ControlNetModel, StableDiffusionControlNetPipeline controlnet = ControlNetModel.from_pretrained("otnl/dreamlite-stream-temporal-lllite-v3") pipe = StableDiffusionControlNetPipeline.from_pretrained( "ByteVisionLab/DreamLite-mobile", controlnet=controlnet ) - Notebooks
- Google Colab
- Kaggle
Temporal LLLite v3 (DreamLite-mobile)
This repository hosts the Temporal LLLite v3 adapter weights for DreamLite-mobile, trained for streaming video stylization at video-rate throughput on a single consumer GPU. The adapter is the artefact described in:
Video-Rate Streaming Stylization on a Vision-Aware MLLM-Conditioned Edit Diffusion: Asymmetric Batched Inference on a Distilled UNet + MLLM Text Encoder. Yoshiyuki Ootani, 2026 (preprint).
The accompanying inference code, evaluation harness, training scripts, and Zenodo-archived code release are at github.com/otanl/dreamlite-stream.
What this is
A ControlNet-LLLite-style attention adapter (kohya-ss/sd-scripts) attached to DreamLite-mobile's 0.39 B distilled edit U-Net. The conditioning input is the warped previous decoded frame (Farnebäck flow on the previous output), and the adapter learns a temporal-consistency residual that reduces inter-frame flicker on streaming video stylization.
In the paper, this adapter pairs with three engineering mechanisms (asymmetric side-stream / main-stream CUDA pipelining, a compile-friendly LLLite reformulation, and a periodic conditioning-refresh schedule) to reach sustained video-rate streaming throughput:
| GPU | sustained fps (B=8) | e2e p50 latency |
|---|---|---|
| RTX 3090 Ti | 27.4 fps | 0.51 s |
| RTX 4090 | 54.9 fps | — (DAVIS-10 measurement) |
| RTX 5090 | 74.1 fps | — (DAVIS-10 measurement) |
All numbers at 512×512 with the v3 adapter active. End-to-end p50 latency is reported only on RTX 3090 Ti where the 480-frame sustained test was run.
Files
temporal_lllite_step001440.safetensors(51 MB)- Trained for 12 epochs on 10 DAVIS-2017 sequences × 50 frames, AdamW8bit, post-hoc α=0.85 blended teacher target.
- SHA-256:
88082c6bf56770469ad4ecbbca467b315ffcf4b5287fd17733751e2952fee7fc
Usage (sketch)
from safetensors.torch import load_file
# 1. Load DreamLite-mobile via the upstream project
# (https://github.com/ByteVisionLab/DreamLite — access via their
# release-request process).
# 2. Apply the LLLite adapter port:
from dreamlite_lllite import apply_lllite
apply_lllite(
unet,
state_dict=load_file("temporal_lllite_step001440.safetensors"),
inference_mode=True,
hooks="down_blocks", # 38-hook subset used in the paper
)
# 3. Run the streaming inference pipeline:
# see `scripts/demo_camera.py` in github.com/otanl/dreamlite-stream
Full reproduction requires the upstream DreamLite-mobile checkpoint (currently gated by the upstream project's release-request process). The adapter alone is not useful without the base model.
License
CC BY-NC 4.0 (Creative Commons Attribution-NonCommercial 4.0).
The adapter is an Adapted Material of DreamLite-mobile (CC BY-NC 4.0 §1(a)) and inherits its non-commercial weight license. The inference code on the linked GitHub repository is released under Apache-2.0 and is unaffected by this inheritance.
See ATTRIBUTION.md for the full attribution chain (DreamLite, Qwen3-VL, ControlNet-LLLite, kohya-ss/sd-scripts).
Citation
Until the peer-reviewed version is published, please cite the Zenodo-archived release of the inference repository:
@software{ootani2026dreamlite_stream,
author = {Ootani, Yoshiyuki},
title = {{dreamlite-stream}: Video-Rate Streaming Stylization on a
Vision-Aware MLLM-Conditioned Edit Diffusion},
year = {2026},
version = {v0.1.0-tcsvt-submission},
doi = {10.5281/zenodo.20389428},
url = {https://github.com/otanl/dreamlite-stream}
}
The arXiv preprint will be added here once endorsed and released.
Notes
- The adapter was trained on a single oil-painting prompt; for prompt-level generalisation use the v4 multi-prompt variant once released.
- The
down_blockshook subset (38 of 108 hooks) is the recommended inference configuration; see §III-D and Tables II / VI of the paper for the smoothing-artifact disclosure and cond-refresh sweep rationale. - This is not a standalone model: it is a temporal-consistency side-network for DreamLite-mobile. Users must obtain DreamLite-mobile separately from the upstream project under their own licence.
- Downloads last month
- 7