| --- |
| license: apache-2.0 |
| pipeline_tag: image-to-video |
| tags: |
| - image-to-video |
| - robotics |
| - world-model |
| arxiv: 2603.17808 |
| --- |
| |
| # EVA |
|
|
| This repository hosts the EVA checkpoint released with: |
|
|
| **EVA: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards** |
|
|
| Project page: https://eva-project-page.github.io/ |
| arxiv: https://arxiv.org/abs/2603.17808 |
|
|
| ## Model Summary |
|
|
| This checkpoint is an EVA model for robotic video planning. |
|
|
| It is built on top of a Wan2.1 I2V 14B backbone and further adapted on **Robotwin** through: |
|
|
| - supervised fine-tuning (SFT) |
| - reinforcement-learning-based post-training (RL) |
|
|
| The released checkpoint corresponds to the merged model after both stages of post-training. |
|
|
| ## Intended Use |
|
|
| This model is intended for research use in robot video prediction and visual planning. |
|
|
| Given an input image and a language instruction, the model generates future video rollouts that are better aligned with executable robot behavior. |
|
|
|
|
|
|