Succulent-VAE-128

这是一个基于 diffusers 库中 AutoencoderKL 架构训练的无条件 VAE 模型，专门用于多肉植物的图像重建与潜空间特征融合。

🌟 项目亮点

完整流水线：从实地拍照、SAM 分割、DINO+UMAP 聚类到数据增强及 VAE 训练。
高质量重建：在 128x128 分辨率下，通过 MSE + KL + VGG 感知损失，实现了极高的纹理还原度。
有趣的融合：支持在潜空间进行插值，生成两朵多肉之间的“中间态”。

📊 训练细节

训练设备：NVIDIA RTX 5090 (约 90 分钟完成 200 轮训练)
数据集：Succulent-Vision-Dataset (2000张经过增强的多肉图片)
损失函数：
- MSE Loss (像素级还原)
- VGG Perceptual Loss (保持多肉叶片质感)
- KL Divergence (约束潜空间，便于特征融合)
参数配置：
- Resolution: 128x128
- Batch Size: 128
- Latent Channels: 4

🚀 快速使用

你可以使用以下代码加载并使用该模型：

from diffusers import AutoencoderKL
import torch

# 加载模型
model_id = "HaiPenglai/Succulent-VAE-128"
vae = AutoencoderKL.from_pretrained(model_id)

# 假设你有一张 128x128 的多肉图片 tensor: x
# 编码到潜空间
# posterior = vae.encode(x).latent_dist
# z = posterior.sample()
# 从潜空间解码
# reconstruction = vae.decode(z).sample

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support