Instructions to use Skywork/SkyReels-A1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Skywork/SkyReels-A1 with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Skywork/SkyReels-A1", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
import torch
from diffusers import DiffusionPipeline
from diffusers.utils import load_image, export_to_video
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("Skywork/SkyReels-A1", dtype=torch.bfloat16, device_map="cuda")
pipe.to("cuda")
prompt = "A man with short gray hair plays a red electric guitar."
image = load_image(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png"
)
output = pipe(image=image, prompt=prompt).frames[0]
export_to_video(output, "output.mp4")SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers
π Github Β· π Playground Β· Discord
This repo contains Diffusers style model weights for Skyreels A1 models. You can find the inference code on SkyReels-A1 repository.
π₯π₯π₯ News!!
- Mar 4, 2025: π We release audio-driven portrait image animation pipeline. SkyReels-A1
- Feb 18, 2025: π We release the inference code and model weights of SkyReels-A1.
- Feb 18, 2025: π₯ We have open-sourced I2V video generation model SkyReels-V1. This is the first and most advanced open-source human-centric video foundation model.
Overview of SkyReels-A1 framework. Given an input video sequence and a reference portrait image, we extract facial expression-aware landmarks from the video, which serve as motion descriptors for transferring expressions onto the portrait. Utilizing a conditional video generation framework based on DiT, our approach directly integrates these facial expression-aware landmarks into the input latent space. In alignment with prior research, we employ a pose guidance mechanism constructed within a VAE architecture. This component encodes facial expression-aware landmarks as conditional input for the DiT framework, thereby enabling the model to capture essential low- dimensional visual attributes while preserving the semantic integrity of facial features.
Some generated results:
Citation
If you find SkyReels-A1 useful for your research, welcome to cite our work using the following BibTeX:
@article{qiu2025skyreels,
title={SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers},
author={Qiu, Di and Fei, Zhengcong and Wang, Rui and Bai, Jialin and Yu, Changqian and Fan, Mingyuan and Chen, Guibin and Wen, Xiang},
journal={arXiv preprint arXiv:2502.10841},
year={2025}
}
- Downloads last month
- 24
Model tree for Skywork/SkyReels-A1
Base model
zai-org/CogVideoX-5b-I2V