import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("aidanscannell/wan-vae-minecraft", dtype=torch.bfloat16, device_map="cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]wan-vae-minecraft
Finetuned AutoencoderKLWan
derived from Wan-AI/Wan2.2-TI2V-5B-Diffusers.
Usage
from diffusers import AutoencoderKLWan
import torch
vae = AutoencoderKLWan.from_pretrained("aidanscasnnell/wan-vae-minecraft", torch_dtype=torch.float32)
Dataset
- Dataset: Minecraft (OpenAI VPT)
- Training resolution: 352x640 (H×W)
Frames sampled from the Minecraft gameplay videos released by OpenAI's Video Pre-Training project (Video-Pre-Training). Episodes are RGB video; the VAE is trained to reconstruct them independently of actions.
Training
- Last saved step:
47500
Reproduction
- Training script:
scripts/finetune_wan_vae.py - Source commit:
2383febaae466e196f1db9204b54142eebb91dc2
Best metrics
metric: val/lossmode: minvalue: 1.1031190085411071step: 47500
Training config
{
"data": {
"dataset": "minerl",
"seed": 123,
"ctx_len_fr": 17,
"pred_len_fr": 16,
"eval_pred_len_fr": 60,
"eval_data_stride": 1,
"resize_resolution": [
352,
640
],
"path": "1x-technologies/worldmodel_raw_data",
"use_latents": false,
"bfloat16_latents": true,
"use_precomputed_index": true,
"minerl_dir": "/mnt/minecraft",
"tasks": [
"all"
],
"minerl_split_seed": 42,
"minerl_drop_last": true,
"minerl_pad_to_len": false,
"minerl_latents_dir": null,
"minerl_latents_subdir": "latents",
"minerl_window_stride_lat": 1,
"minerl_total_val_clips": 512,
"minerl_total_test_clips": 1024,
"num_workers": 8,
"persistent_workers": true,
"prefetch_factor": 4,
"pin_memory": true
},
"model": {
"hf_id": "Wan-AI/Wan2.2-TI2V-5B-Diffusers",
"hf_subfolder": "vae",
"torch_dtype": "float32"
},
"loss": {
"l1_w": 3.0,
"kl_w": 3e-06,
"lpips_w": 3.0,
"temporal_w": 0.5,
"use_lpips": true,
"lpips_net": "vgg",
"lpips_on_frames": true,
"use_gan": false,
"gan_w": 0.1,
"disc_lr": 0.0002,
"disc_steps_per_gen_step": 1,
"disc_start_step": 0,
"r1_gamma": 0.0
},
"train": {
"finetune": "decoder",
"seed": 123,
"device": "cuda",
"amp": true,
"amp_dtype": "bfloat16",
"batch_size": 1,
"val_batch_size": 1,
"grad_accum_steps": 2,
"lr": 1e-05,
"betas": [
0.9,
0.999
],
"weight_decay": 0.0,
"max_steps": 50000,
"log_every": 50,
"eval_every": 500,
"save_every": 500,
"video_every": 500,
"video_fps": 8,
"max_val_batches": 50,
"best_metric_name": "val/loss",
"best_metric_mode": "min",
"resume_path": null
},
"compile": {
"enabled": false,
"backend": "inductor",
"mode": "max-autotune",
"fullgraph": false
},
"metrics": {
"enabled": true,
"metric_names": [
"psnr",
"ssim",
"lpips"
],
"metrics_num_samples": 256,
"psnr_log_stride": 1,
"max_val": 1.0
},
"logger": {
"use_wandb": true,
"run_name": "res_[352, 640]-all_data-temporal_w_0.5-updated-tasks_['all']-finetune_decoder-bsize_1-accum_2-lr_1e-05-pred_len_16",
"project": "wan-vae-finetune"
}
}
License
Inherits the license of the base model (Wan-AI/Wan2.2-TI2V-5B-Diffusers); verify terms before redistribution.
- Downloads last month
- 14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for aidanscannell/wan-vae-minecraft
Base model
Wan-AI/Wan2.2-TI2V-5B-Diffusers