Papers
arxiv:2603.27520

TokenDial: Continuous Attribute Control in Text-to-Video via Spatiotemporal Token Offsets

Published on Mar 29
· Submitted by
Zhixuan Liu
on Apr 1
Authors:
,
,
,
,
,
,

Abstract

TokenDial enables precise attribute control in text-to-video models by using additive offsets in spatiotemporal token space for coherent edits without retraining.

AI-generated summary

We present TokenDial, a framework for continuous, slider-style attribute control in pretrained text-to-video generation models. While modern generators produce strong holistic videos, they offer limited control over how much an attribute changes (e.g., effect intensity or motion magnitude) without drifting identity, background, or temporal coherence. TokenDial is built on the observation: additive offsets in the intermediate spatiotemporal visual patch-token space form a semantic control direction, where adjusting the offset magnitude yields coherent, predictable edits for both appearance and motion dynamics. We learn attribute-specific token offsets without retraining the backbone, using pretrained understanding signals: semantic direction matching for appearance and motion-magnitude scaling for motion. We demonstrate TokenDial's effectiveness on diverse attributes and prompts, achieving stronger controllability and higher-quality edits than state-of-the-art baselines, supported by extensive quantitative evaluation and human studies.

Community

Paper author Paper submitter

TokenDial turns pretrained text-to-video models into continuous video editors, enables fine-grained slider-style control over both appearance and motion attributes. Enabling precise masking and works with Wan 2.1 T2V 1.3B.

tokendial1

tokendial2

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2603.27520
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.27520 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.27520 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.27520 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.