Steady-Forcing: Balancing Spatial Persistence and Motion Continuity in Long-Horizon Nature Video Diffusion

✍️ Authors

Matiur Rahman Minar¹, Seunghun Oh², Ganghyeon Jeong², Unsang Park^1,2
¹Department of Computer Science and Engineering, Sogang University ²Department of Artificial Intelligence, Sogang University

🚀 Progress

📝 Technical Report / Paper
🌐 Project Homepage
💻 Training & Inference Code
🤗 Pretrained Model: T2V-1.3B

🎯 Overview

Steady-Forcing produces long-horizon nature video rollouts from a fixed-camera view. It decouples spatial persistence from motion continuity via a structural dual-memory protocol. This enables stable backgrounds and sustained fluid motion.

TL;DR: We propose a dual-memory framework that balances stability and motion to sustain high background persistence and continuous fluid dynamics over multi-minute horizons for fixed-camera nature video generation.

🎬 Demo

https://minar09.github.io/steadyforcing/

🔧 Requirements

Requirement	Specification
GPU	NVIDIA GPU with ≥ 24 GB VRAM (tested on A100 80 GB)
OS	Linux
Python	3.10
Training Setup	8 × A100 GPUs (for full training run)

Other hardware configurations may work but have not been tested.

🛠️ Installation

Clone the repository and set up the environment in one step:

git clone https://github.com/minar09/steady-forcing.git
cd steady-forcing
bash setup_env.sh

This script creates a Python 3.10 environment, installs all dependencies from requirements.txt, and downloads the required base models.

Alternatively, a Dockerfile is provided for containerized setups:

docker build -t steady-forcing .

📦 Pretrained Checkpoints

Download

huggingface-cli download minar09/Steady-Forcing-T2V-1.3B --local-dir ./ckpt

Or using the Python API:

from huggingface_hub import snapshot_download
snapshot_download(repo_id="minar09/Steady-Forcing-T2V-1.3B", local_dir="./ckpt")

Note: The training algorithm is data-free distillation — no video dataset is required.

File Structure

After downloading, organize your working directory as follows:

steady-forcing/
├── prompts/              # Example text prompts
├── configs/              # Model and training configs
├── pipeline/             # Inference pipeline
├── trainer/              # Training modules
├── demo_utils/           # Demo helper utilities
├── scripts/              # Utility scripts
├── templates/            # Prompt templates
├── ckpt/
│   └── steady-forcing-t2v.pt   # Main model checkpoint
├── inference.py
├── inference.sh
├── train.py
├── train.sh
├── demo.py
└── setup.py

🚀 Inference

Quick Start

bash inference.sh

Custom Prompt Inference

from pipeline import SteadyForcingPipeline

pipe = SteadyForcingPipeline.from_pretrained("minar09/Steady-Forcing-T2V-1.3B")

prompt = """A serene woodland stream scene recorded by a completely fixed, static,
tripod mounted camera. A narrow stream of clear water flows continuously from the
upper part of the frame toward the lower edge, winding gently between moss-covered
rocks and grassy banks. [60s]"""

negative_prompt = """solid water, metallic water, water stagnation, color drift,
frozen motion, camera movement, zooming, panning, visual artifacts, unnatural water,
unnatural waves, unnatural flow, unnatural motion, human, animal"""

video = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_frames=480,   # 60s at 8fps
).frames

video[0].save("output.mp4")

Recommended Negative Prompt

For best results, always include this negative prompt to suppress common failure modes:

色调艳丽，过曝，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，
JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，
形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走,
solid water, metallic water, water stagnation, color drift, water flow drift,
water color drift, water surface drift, scene drift, background drift,
frozen motion, camera movement, zooming, panning, camera drift,
visual artifacts, camera, tripod, ground artifacts, anomalous textures,
unrealistic round shaped pattern, localized distortions, unnatural water,
unnatural waves, unnatural flow, unnatural motion, unnatural physics,
unnatural dynamics, unnatural fluidity, unnatural surface, unnatural reflections,
unnatural refractions, unnatural transparency, unnatural opacity,
unnatural viscosity, unnatural turbulence, unnatural splashes, unnatural ripples,
unnatural foam, unnatural spray, unnatural mist, unnatural droplets,
human, animal, repetitive round textures, pond effects, low dynamic degree,
unnatural color, unnatural lighting, unnatural shadows, unnatural highlights,
unnatural contrast, unnatural saturation, unnatural hue, unnatural brightness,
unnatural darkness, unnatural exposure, unnatural noise, unnatural grain,
unnatural blur, unnatural sharpness, unnatural clarity, unnatural detail,
unnatural texture, unnatural pattern, unnatural composition

🏋️ Training

Self-Forcing Training with DMD

bash train.sh

Training was completed in under 67 hours on 8 × A100 GPUs. No video dataset is required — the method uses data-free ODE distillation.

Running the Demo

python demo.py

Testing Model Loading

python test_loading.py

🔑 Trigger Words

Include these keywords in your prompts to activate model-specific conditioning:

Trigger Word	Purpose
`Steady-Forcing: Balancing`	Core method conditioning
`Spatial Persistence`	Static background anchoring
`Motion Continuity`	Sustained fluid dynamics
`Long-Horizon`	Extended duration generation
`Nature Video Diffusion`	Natural scene domain
`drift-stagnation`	Suppresses visual drift artifacts
`nature-flow`	Natural fluid flow conditioning
`fixed-camera`	Fixed tripod camera constraint
`static nature`	Static environmental anchoring

📊 Results

Quantitative and qualitative results are available in the arXiv preprint. For visualizations and video comparisons, visit the project page.

📄 Citation

If you use this model or codebase in your research, please cite:

@article{minar2025steady,
  title={Steady-Forcing: Balancing Spatial Persistence and Motion Continuity
         in Long-Horizon Nature Video Diffusion},
  author={Minar, Matiur Rahman and Oh, Seunghun and Jeong, Ganghyeon and Park, Unsang},
  journal={arXiv preprint arXiv:2606.7661673},
  year={2026}
}

🤝 Acknowledgements

This project builds on the open-source Infinity-RoPE and Reward-Forcing implementations, and acknowledges related work in long-horizon video diffusion, motion continuity, and spatial persistence. We sincerely appreciate their efforts and thank them.

Downloads last month: -

Model tree for minar09/Steady-Forcing-T2V-1.3B

Base model

JaydenLu666/Reward-Forcing-T2V-1.3B

Finetuned

(1)

this model

Evaluation results

Background Consistency Score (BCS)
self-reported

98.060
Motion Smoothness Score (MSS)
self-reported

99.000
Temporal Flicker Index (TFI)
self-reported

98.530
Imaging Quality Score (IQS)
self-reported

69.870