How to use from the
Use from the
Diffusers library
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline

# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("minar09/Steady-Forcing-T2V-1.3B", dtype=torch.bfloat16, device_map="cuda")

prompt = "A beautiful South Korean empty sea beach scene recorded by a completely fixed, static, tripod mounted camera. The camera is not seen, it does not move, tilt, pan, or zoom at any point. The sandy shoreline stretches across the frame, with gentle waves rolling in from the horizon and softly breaking against the beach in a continuous rhythm. The motion of the tides is natural and consistent, creating subtle ripples and foamy edges as the water meets the sand. The beach itself remains perfectly static, with smooth sand, scattered shells, and distant rocks unmoving in the background. The atmosphere is stunning, as the sun hangs low near the horizon, casting warm golden and orange tones across the sky and reflecting softly on the water's surface. The distant sea remains calm, with no flicker, jumps, resets, or artificial distortions, while the tide flows seamlessly in one constant direction. The scene unfolds in real time over a long duration, emphasizing the physical dynamics of the ocean waves and the tranquil beauty of the sunset. The video maintains temporal continuity across all frames, showing the uninterrupted progression of tides softly hitting the sandy beach without cinematic exaggeration or dramatic effects. [60s]"
image = pipe(prompt).images[0]

Steady-Forcing: Balancing Spatial Persistence and Motion Continuity in Long-Horizon Nature Video Diffusion


✍️ Authors

Matiur Rahman Minar1, Seunghun Oh2, Ganghyeon Jeong2, Unsang Park1,2
1Department of Computer Science and Engineering, Sogang University   2Department of Artificial Intelligence, Sogang University

🚀 Progress

  • 📝 Technical Report / Paper
  • 🌐 Project Homepage
  • 💻 Training & Inference Code
  • 🤗 Pretrained Model: T2V-1.3B

🎯 Overview

Steady-Forcing produces long-horizon nature video rollouts from a fixed-camera view. It decouples spatial persistence from motion continuity via a structural dual-memory protocol. This enables stable backgrounds and sustained fluid motion.

TL;DR: We propose a dual-memory framework that balances stability and motion to sustain high background persistence and continuous fluid dynamics over multi-minute horizons for fixed-camera nature video generation.

🎬 Demo

https://minar09.github.io/steadyforcing/


🔧 Requirements

Requirement Specification
GPU NVIDIA GPU with ≥ 24 GB VRAM (tested on A100 80 GB)
OS Linux
Python 3.10
Training Setup 8 × A100 GPUs (for full training run)

Other hardware configurations may work but have not been tested.


🛠️ Installation

Clone the repository and set up the environment in one step:

git clone https://github.com/minar09/steady-forcing.git
cd steady-forcing
bash setup_env.sh

This script creates a Python 3.10 environment, installs all dependencies from requirements.txt, and downloads the required base models.

Alternatively, a Dockerfile is provided for containerized setups:

docker build -t steady-forcing .

📦 Pretrained Checkpoints

Download

huggingface-cli download minar09/Steady-Forcing-T2V-1.3B --local-dir ./ckpt

Or using the Python API:

from huggingface_hub import snapshot_download
snapshot_download(repo_id="minar09/Steady-Forcing-T2V-1.3B", local_dir="./ckpt")

Note: The training algorithm is data-free distillation — no video dataset is required.

File Structure

After downloading, organize your working directory as follows:

steady-forcing/
├── prompts/              # Example text prompts
├── configs/              # Model and training configs
├── pipeline/             # Inference pipeline
├── trainer/              # Training modules
├── demo_utils/           # Demo helper utilities
├── scripts/              # Utility scripts
├── templates/            # Prompt templates
├── ckpt/
│   └── steady-forcing-t2v.pt   # Main model checkpoint
├── inference.py
├── inference.sh
├── train.py
├── train.sh
├── demo.py
└── setup.py

🚀 Inference

Quick Start

bash inference.sh

Custom Prompt Inference

from pipeline import SteadyForcingPipeline

pipe = SteadyForcingPipeline.from_pretrained("minar09/Steady-Forcing-T2V-1.3B")

prompt = """A serene woodland stream scene recorded by a completely fixed, static,
tripod mounted camera. A narrow stream of clear water flows continuously from the
upper part of the frame toward the lower edge, winding gently between moss-covered
rocks and grassy banks. [60s]"""

negative_prompt = """solid water, metallic water, water stagnation, color drift,
frozen motion, camera movement, zooming, panning, visual artifacts, unnatural water,
unnatural waves, unnatural flow, unnatural motion, human, animal"""

video = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_frames=480,   # 60s at 8fps
).frames

video[0].save("output.mp4")

Recommended Negative Prompt

For best results, always include this negative prompt to suppress common failure modes:

色调艳丽,过曝,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,
JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,
形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走,
solid water, metallic water, water stagnation, color drift, water flow drift,
water color drift, water surface drift, scene drift, background drift,
frozen motion, camera movement, zooming, panning, camera drift,
visual artifacts, camera, tripod, ground artifacts, anomalous textures,
unrealistic round shaped pattern, localized distortions, unnatural water,
unnatural waves, unnatural flow, unnatural motion, unnatural physics,
unnatural dynamics, unnatural fluidity, unnatural surface, unnatural reflections,
unnatural refractions, unnatural transparency, unnatural opacity,
unnatural viscosity, unnatural turbulence, unnatural splashes, unnatural ripples,
unnatural foam, unnatural spray, unnatural mist, unnatural droplets,
human, animal, repetitive round textures, pond effects, low dynamic degree,
unnatural color, unnatural lighting, unnatural shadows, unnatural highlights,
unnatural contrast, unnatural saturation, unnatural hue, unnatural brightness,
unnatural darkness, unnatural exposure, unnatural noise, unnatural grain,
unnatural blur, unnatural sharpness, unnatural clarity, unnatural detail,
unnatural texture, unnatural pattern, unnatural composition

🏋️ Training

Self-Forcing Training with DMD

bash train.sh

Training was completed in under 67 hours on 8 × A100 GPUs. No video dataset is required — the method uses data-free ODE distillation.

Running the Demo

python demo.py

Testing Model Loading

python test_loading.py

🔑 Trigger Words

Include these keywords in your prompts to activate model-specific conditioning:

Trigger Word Purpose
Steady-Forcing: Balancing Core method conditioning
Spatial Persistence Static background anchoring
Motion Continuity Sustained fluid dynamics
Long-Horizon Extended duration generation
Nature Video Diffusion Natural scene domain
drift-stagnation Suppresses visual drift artifacts
nature-flow Natural fluid flow conditioning
fixed-camera Fixed tripod camera constraint
static nature Static environmental anchoring

📊 Results

Quantitative and qualitative results are available in the arXiv preprint. For visualizations and video comparisons, visit the project page.


📄 Citation

If you use this model or codebase in your research, please cite:

@article{minar2025steady,
  title={Steady-Forcing: Balancing Spatial Persistence and Motion Continuity
         in Long-Horizon Nature Video Diffusion},
  author={Minar, Matiur Rahman and Oh, Seunghun and Jeong, Ganghyeon and Park, Unsang},
  journal={arXiv preprint arXiv:2606.7661673},
  year={2026}
}

🤝 Acknowledgements

This project builds on the open-source Infinity-RoPE and Reward-Forcing implementations, and acknowledges related work in long-horizon video diffusion, motion continuity, and spatial persistence. We sincerely appreciate their efforts and thank them.

Downloads last month
-
Inference Examples
Examples
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for minar09/Steady-Forcing-T2V-1.3B

Finetuned
(1)
this model

Evaluation results