Temporal LLLite v3 (DreamLite-mobile)

This repository hosts the Temporal LLLite v3 adapter weights for DreamLite-mobile, trained for streaming video stylization at video-rate throughput on a single consumer GPU. The adapter is the artefact described in:

Video-Rate Streaming Stylization on a Vision-Aware MLLM-Conditioned Edit Diffusion: Asymmetric Batched Inference on a Distilled UNet + MLLM Text Encoder. Yoshiyuki Ootani, 2026 (preprint).

The accompanying inference code, evaluation harness, training scripts, and Zenodo-archived code release are at github.com/otanl/dreamlite-stream.

What this is

A ControlNet-LLLite-style attention adapter (kohya-ss/sd-scripts) attached to DreamLite-mobile's 0.39 B distilled edit U-Net. The conditioning input is the warped previous decoded frame (Farnebäck flow on the previous output), and the adapter learns a temporal-consistency residual that reduces inter-frame flicker on streaming video stylization.

In the paper, this adapter pairs with three engineering mechanisms (asymmetric side-stream / main-stream CUDA pipelining, a compile-friendly LLLite reformulation, and a periodic conditioning-refresh schedule) to reach sustained video-rate streaming throughput:

GPU	sustained fps (B=8)	e2e p50 latency
RTX 3090 Ti	27.4 fps	0.51 s
RTX 4090	54.9 fps	— (DAVIS-10 measurement)
RTX 5090	74.1 fps	— (DAVIS-10 measurement)

All numbers at 512×512 with the v3 adapter active. End-to-end p50 latency is reported only on RTX 3090 Ti where the 480-frame sustained test was run.

Files

temporal_lllite_step001440.safetensors (51 MB)
- Trained for 12 epochs on 10 DAVIS-2017 sequences × 50 frames, AdamW8bit, post-hoc α=0.85 blended teacher target.
- SHA-256: 88082c6bf56770469ad4ecbbca467b315ffcf4b5287fd17733751e2952fee7fc

Usage (sketch)

from safetensors.torch import load_file

# 1. Load DreamLite-mobile via the upstream project
#    (https://github.com/ByteVisionLab/DreamLite — access via their
#    release-request process).
# 2. Apply the LLLite adapter port:
from dreamlite_lllite import apply_lllite
apply_lllite(
    unet,
    state_dict=load_file("temporal_lllite_step001440.safetensors"),
    inference_mode=True,
    hooks="down_blocks",   # 38-hook subset used in the paper
)
# 3. Run the streaming inference pipeline:
#    see `scripts/demo_camera.py` in github.com/otanl/dreamlite-stream

Full reproduction requires the upstream DreamLite-mobile checkpoint (currently gated by the upstream project's release-request process). The adapter alone is not useful without the base model.

License

CC BY-NC 4.0 (Creative Commons Attribution-NonCommercial 4.0).

The adapter is an Adapted Material of DreamLite-mobile (CC BY-NC 4.0 §1(a)) and inherits its non-commercial weight license. The inference code on the linked GitHub repository is released under Apache-2.0 and is unaffected by this inheritance.

See ATTRIBUTION.md for the full attribution chain (DreamLite, Qwen3-VL, ControlNet-LLLite, kohya-ss/sd-scripts).

Citation

Until the peer-reviewed version is published, please cite the Zenodo-archived release of the inference repository:

@software{ootani2026dreamlite_stream,
  author  = {Ootani, Yoshiyuki},
  title   = {{dreamlite-stream}: Video-Rate Streaming Stylization on a
             Vision-Aware MLLM-Conditioned Edit Diffusion},
  year    = {2026},
  version = {v0.1.0-tcsvt-submission},
  doi     = {10.5281/zenodo.20389428},
  url     = {https://github.com/otanl/dreamlite-stream}
}

The arXiv preprint will be added here once endorsed and released.

Notes

The adapter was trained on a single oil-painting prompt; for prompt-level generalisation use the v4 multi-prompt variant once released.
The down_blocks hook subset (38 of 108 hooks) is the recommended inference configuration; see §III-D and Tables II / VI of the paper for the smoothing-artifact disclosure and cond-refresh sweep rationale.
This is not a standalone model: it is a temporal-consistency side-network for DreamLite-mobile. Users must obtain DreamLite-mobile separately from the upstream project under their own licence.

Downloads last month: 7