YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Steady-Forcing: Balancing Spatial Persistence and Motion Continuity in Long-Horizon Nature Video Diffusion


โœ๏ธ Authors

Matiur Rahman Minar1, Seunghun Oh2, Ganghyeon Jeong2, Unsang Park1,2
1Department of Computer Science and Engineering, Sogang University   2Department of Artificial Intelligence, Sogang University

๐Ÿš€ Progress

  • ๐Ÿ“ Technical Report / Paper
  • ๐ŸŒ Project Homepage
  • ๐Ÿ’ป Training & Inference Code
  • ๐Ÿค— Pretrained Model: T2V-1.3B

๐ŸŽฏ Overview

Steady-Forcing produces long-horizon nature video rollouts from a fixed-camera view. It decouples spatial persistence from motion continuity via a structural dual-memory protocol. This enables stable backgrounds and sustained fluid motion.

TL;DR: We propose a dual-memory framework that balances stability and motion to sustain high background persistence and continuous fluid dynamics over multi-minute horizons for fixed-camera nature video generation.

๐ŸŽฌ Demo

https://minar09.github.io/steadyforcing/


๐Ÿ”ง Requirements

Requirement Specification
GPU NVIDIA GPU with โ‰ฅ 24 GB VRAM (tested on A100 80 GB)
OS Linux
Python 3.10
Training Setup 8 ร— A100 GPUs (for full training run)

Other hardware configurations may work but have not been tested.


๐Ÿ› ๏ธ Installation

Clone the repository and set up the environment in one step:

git clone https://github.com/minar09/steady-forcing.git
cd steady-forcing
bash setup_env.sh

This script creates a Python 3.10 environment, installs all dependencies from requirements.txt, and downloads the required base models.

Alternatively, a Dockerfile is provided for containerized setups:

docker build -t steady-forcing .

๐Ÿ“ฆ Pretrained Checkpoints

Download

huggingface-cli download minar09/Steady-Forcing-T2V-1.3B --local-dir ./ckpt

Or using the Python API:

from huggingface_hub import snapshot_download
snapshot_download(repo_id="minar09/Steady-Forcing-T2V-1.3B", local_dir="./ckpt")

Note: The training algorithm is data-free distillation โ€” no video dataset is required.

File Structure

After downloading, organize your working directory as follows:

steady-forcing/
โ”œโ”€โ”€ prompts/              # Example text prompts
โ”œโ”€โ”€ configs/              # Model and training configs
โ”œโ”€โ”€ pipeline/             # Inference pipeline
โ”œโ”€โ”€ trainer/              # Training modules
โ”œโ”€โ”€ demo_utils/           # Demo helper utilities
โ”œโ”€โ”€ scripts/              # Utility scripts
โ”œโ”€โ”€ templates/            # Prompt templates
โ”œโ”€โ”€ ckpt/
โ”‚   โ””โ”€โ”€ steady-forcing-t2v.pt   # Main model checkpoint
โ”œโ”€โ”€ inference.py
โ”œโ”€โ”€ inference.sh
โ”œโ”€โ”€ train.py
โ”œโ”€โ”€ train.sh
โ”œโ”€โ”€ demo.py
โ””โ”€โ”€ setup.py

๐Ÿš€ Inference

Quick Start

bash inference.sh

Custom Prompt Inference

from pipeline import SteadyForcingPipeline

pipe = SteadyForcingPipeline.from_pretrained("minar09/Steady-Forcing-T2V-1.3B")

prompt = """A serene woodland stream scene recorded by a completely fixed, static,
tripod mounted camera. A narrow stream of clear water flows continuously from the
upper part of the frame toward the lower edge, winding gently between moss-covered
rocks and grassy banks. [60s]"""

negative_prompt = """solid water, metallic water, water stagnation, color drift,
frozen motion, camera movement, zooming, panning, visual artifacts, unnatural water,
unnatural waves, unnatural flow, unnatural motion, human, animal"""

video = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_frames=480,   # 60s at 8fps
).frames

video[0].save("output.mp4")

Recommended Negative Prompt

For best results, always include this negative prompt to suppress common failure modes:

่‰ฒ่ฐƒ่‰ณไธฝ๏ผŒ่ฟ‡ๆ›๏ผŒ็ป†่Š‚ๆจก็ณŠไธๆธ…๏ผŒๅญ—ๅน•๏ผŒ้ฃŽๆ ผ๏ผŒไฝœๅ“๏ผŒ็”ปไฝœ๏ผŒ็”ป้ข๏ผŒ้™ๆญข๏ผŒๆ•ดไฝ“ๅ‘็ฐ๏ผŒๆœ€ๅทฎ่ดจ้‡๏ผŒไฝŽ่ดจ้‡๏ผŒ
JPEGๅŽ‹็ผฉๆฎ‹็•™๏ผŒไธ‘้™‹็š„๏ผŒๆฎ‹็ผบ็š„๏ผŒๅคšไฝ™็š„ๆ‰‹ๆŒ‡๏ผŒ็”ปๅพ—ไธๅฅฝ็š„ๆ‰‹้ƒจ๏ผŒ็”ปๅพ—ไธๅฅฝ็š„่„ธ้ƒจ๏ผŒ็•ธๅฝข็š„๏ผŒๆฏๅฎน็š„๏ผŒ
ๅฝขๆ€็•ธๅฝข็š„่‚ขไฝ“๏ผŒๆ‰‹ๆŒ‡่žๅˆ๏ผŒ้™ๆญขไธๅŠจ็š„็”ป้ข๏ผŒๆ‚ไนฑ็š„่ƒŒๆ™ฏ๏ผŒไธ‰ๆก่…ฟ๏ผŒ่ƒŒๆ™ฏไบบๅพˆๅคš๏ผŒๅ€’็€่ตฐ,
solid water, metallic water, water stagnation, color drift, water flow drift,
water color drift, water surface drift, scene drift, background drift,
frozen motion, camera movement, zooming, panning, camera drift,
visual artifacts, camera, tripod, ground artifacts, anomalous textures,
unrealistic round shaped pattern, localized distortions, unnatural water,
unnatural waves, unnatural flow, unnatural motion, unnatural physics,
unnatural dynamics, unnatural fluidity, unnatural surface, unnatural reflections,
unnatural refractions, unnatural transparency, unnatural opacity,
unnatural viscosity, unnatural turbulence, unnatural splashes, unnatural ripples,
unnatural foam, unnatural spray, unnatural mist, unnatural droplets,
human, animal, repetitive round textures, pond effects, low dynamic degree,
unnatural color, unnatural lighting, unnatural shadows, unnatural highlights,
unnatural contrast, unnatural saturation, unnatural hue, unnatural brightness,
unnatural darkness, unnatural exposure, unnatural noise, unnatural grain,
unnatural blur, unnatural sharpness, unnatural clarity, unnatural detail,
unnatural texture, unnatural pattern, unnatural composition

๐Ÿ‹๏ธ Training

Self-Forcing Training with DMD

bash train.sh

Training was completed in under 67 hours on 8 ร— A100 GPUs. No video dataset is required โ€” the method uses data-free ODE distillation.

Running the Demo

python demo.py

Testing Model Loading

python test_loading.py

๐Ÿ”‘ Trigger Words

Include these keywords in your prompts to activate model-specific conditioning:

Trigger Word Purpose
Steady-Forcing: Balancing Core method conditioning
Spatial Persistence Static background anchoring
Motion Continuity Sustained fluid dynamics
Long-Horizon Extended duration generation
Nature Video Diffusion Natural scene domain
drift-stagnation Suppresses visual drift artifacts
nature-flow Natural fluid flow conditioning
fixed-camera Fixed tripod camera constraint
static nature Static environmental anchoring

๐Ÿ“Š Results

Quantitative and qualitative results are available in the arXiv preprint. For visualizations and video comparisons, visit the project page.


๐Ÿ“„ Citation

If you use this model or codebase in your research, please cite:

@article{minar2025steady,
  title={Steady-Forcing: Balancing Spatial Persistence and Motion Continuity
         in Long-Horizon Nature Video Diffusion},
  author={Minar, Matiur Rahman and Oh, Seunghun and Jeong, Ganghyeon and Park, Unsang},
  journal={arXiv preprint arXiv:2606.7661673},
  year={2026}
}

๐Ÿค Acknowledgements

This project builds on the open-source Infinity-RoPE and Reward-Forcing implementations, and acknowledges related work in long-horizon video diffusion, motion continuity, and spatial persistence. We sincerely appreciate their efforts and thank them.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support