**WanGP** supports many different models. Some of them are just better. You will find below a curated selection.

Hit a model name below to load that model in WanGP instantly.

| Role | Model | Why pick it |
| --- | --- | --- |
| Video editing, face injection, spatial & temporal outpainting | [Wan 2.1 VACE](modeltype:vace_14B) | Load a control video, mask the regions you want to affect using Matanyone, and let the model replace faces, patch damaged areas, or extend the scene outside the original frame. If you are looking for an accelerated version of Vace usable out of the box, just try [Vace Fusionix](modeltype:vace_14B_fusionix). |
| Identity-preserving face replacement | [Lynx](modeltype:lynx), [Vace Lynx](modeltype:vace_lynx_14B) | Optimized to keep facial identity even when the hair style, clothes, or lighting change. Feed reference portraits, adjust the Lynx weight sliders, and blend with a VACE pass when you need both background cleanup and face swaps. |
| Motion transfer & performer replacement | [Wan 2.2 Animate](modeltype:animate) | Built for body/facial motion transfers: capture a pose/control video, extract masks with Mat Anyone, and it will replace or animate the target performer. Includes relighting, maskless mode (faster but less precise), and WanGP-exclusive outpainting so new characters stay anchored in the scene. |
| Multi-speaker lip-sync & dialogue | [MultiTalk](modeltype:multitalk),  [InfiniteTalk](modeltype:infinitetalk) | Drop one or two voice tracks, assign speakers, and the model handles lip-sync, gestures, and camera moves. InfiniteTalk mode chains sliding windows so very long monologues/conversations remain coherent. |
| High-fidelity image editing & inpainting | [Qwen Image Edit](modeltype:qwen_image_edit_plus_20B) | Generates or edits high-resolution stills with multi-subject control and long text captions. Works best at 720p, supports brush-based inpainting/outpainting, and has Lightning LoRAs (4-step/8-step) for faster iterations - ideal for prepping key frames before video runs. |
| Storyboarding with first / last frames | [Wan 2.2 i2v](modeltype:i2v_2_2) + [Qwen Image Edit](modeltype:qwen_image_edit_plus_20B) | Supply "first frame" and "last frame" stills for every beat using Qwen, then have Wan generate the motion bridge between them (similar to Nano Banana + Kling loops). Recycle the previous last frame as the next first frame to build entire scenes. As an alternative to i2v use, [InfiniteTalk](modeltype:infinitetalk) (without a speaker) as it offers fluid scene transitions. |
| Fast image generation & key art | [Z Image](modeltype:z_image) | Pick this when you want strong still images fast. It is a very efficient choice for posters, key art, concept frames, and reference images you may later reuse in image-to-video or editing workflows. |
| Cinematic video with native audio | [LTX-2.3 Distilled](modeltype:ltx2_22B_distilled) | A fast cinematic video generator that can create the visuals and soundtrack in one pass, including spoken scenes. It works especially well at 720p and 1080p, and also supports start/end frames, injected frames, control video, outpainting, and talking-head workflows with voice cloning. |
| Flexible speech generation | [Qwen3 TTS](modeltype:qwen3_tts_base) | Pick this when you want a general-purpose speech tool that can clone a voice, generate two-speaker dialogue, or serve as a clean starting point before switching to Custom Voice or Voice Design variants. It stays fast, works well on modest VRAM, and fits both narration and conversational workflows. |
| Emotional voice cloning & dialogue | [Index TTS 2](modeltype:index_tts2) | Pick this when you want expressive voice cloning instead of flat narration. It handles two-speaker cloned dialogue, automatic or tagged emotions, and very long conversations, while still remaining usable on lower-VRAM setups. |
| Lyrics-first song generation | [Ace Step 1.5 XL](modeltype:ace_step_v1_5_xl) | A high-quality song generator for full tracks with lyrics, a must use when lyric adherence matters as much as audio quality: you will get cleaner vocals, and better control over the style and mood of the song. |