File size: 10,683 Bytes
55ad2b6 c426cde 55ad2b6 a058f22 55ad2b6 a058f22 55ad2b6 67ef78e 55ad2b6 67ef78e c3d5d83 55ad2b6 67ef78e 55ad2b6 67ef78e 55ad2b6 67ef78e 55ad2b6 67ef78e 55ad2b6 fb24026 55ad2b6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 |
# 3D Chibi Text-to-Image (14B) Generation
This repository contains the necessary steps and scripts to generate **3D chibi-style images** using the **Wan2.1-T2I-14B** text-to-image model with LoRA (Low-Rank Adaptation) weights. The model produces high-quality 3D chibi-style illustrations based on textual prompts, emphasizing vibrant aesthetics, character expressions, and dynamic scenes.
> 🚀 This readme use **text-to-image (t2i)** generation to allow faster testing while maintaining compatibility with future **text-to-video (t2v)** workflows.
---
## Prerequisites
Before proceeding, ensure that you have the following installed on your system:
- **Ubuntu** (or a compatible Linux distribution)
- **Python 3.x**
- **pip** (Python package manager)
- **Git**
- **Git LFS** (Git Large File Storage)
---
## Installation
1. **Update and Install Dependencies**
```bash
sudo apt-get update && sudo apt-get install build-essential git-lfs
```
2. **Clone the Repository**
> ⚠️ Note: You can use any existing Wan2.1-compatible repo structure or clone directly from Hugging Face.
```bash
git clone https://huggingface.co/svjack/3D_Chibi_wan_2_1_14_B_text2video_lora
cd 3D_Chibi_wan_2_1_14_B_text2video_lora
```
3. **Install Python Dependencies**
```bash
pip install torch torchvision
pip install -r requirements.txt
pip install ascii-magic matplotlib tensorboard huggingface_hub datasets
pip install sageattention==1.0.6
```
4. **Download Model Weights**
> 📌 **Note**: You can view previous results in the respective repositories:
- [Xiang_Handsome LoRA](https://huggingface.co/svjack/Xiang_Handsome_wan_2_1_14_B_text2video_lora)
- [Taiga_Aisaka LoRA](https://huggingface.co/svjack/Taiga_Aisaka_wan_2_1_14_B_text2video_lora)
- [Sebastian_Michaelis LoRA](https://huggingface.co/svjack/Sebastian_Michaelis_wan_2_1_14_B_text2video_lora)
- [3D_Chibi LoRA 14B](https://huggingface.co/svjack/3D_Chibi_wan_2_1_14_B_text2video_lora)
```bash
# Base Models
wget https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/diffusion_models/wan2.1_t2v_14B_bf16.safetensors
wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/models_t5_umt5-xxl-enc-bf16.pth
wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/Wan2.1_VAE.pth
# LoRA Weights
wget https://huggingface.co/svjack/Xiang_Handsome_wan_2_1_14_B_text2video_lora/resolve/main/Xiang_Handsome_outputs/Xiang_Handsome_w14_lora-000067.safetensors
wget https://huggingface.co/svjack/Taiga_Aisaka_wan_2_1_14_B_text2video_lora/resolve/main/Taiga_Aisaka_w14_outputs/Taiga_Aisaka_w14_lora-000010.safetensors
wget https://huggingface.co/svjack/Sebastian_Michaelis_wan_2_1_14_B_text2video_lora/resolve/main/Sebastian_Michaelis_w14_outputs/Sebastian_Michaelis_w14_lora-000007.safetensors
wget https://huggingface.co/svjack/3D_Chibi_wan_2_1_14_B_text2video_lora/resolve/main/3D_Chibi_w14_outputs/3D_Chibi_w14_lora-000024.safetensors
```
---
## Usage
To generate an image, use the `wan_generate_video.py` script with the `--task t2i-14B` parameter.
### Example 1: Xiang InfiniteYou Handsome Style
```bash
python wan_generate_video.py --fp8 --task t2i-14B --video_size 480 832 --infer_steps 20 \
--save_path save --output_type both \
--dit wan2.1_t2v_14B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight Xiang_Handsome_outputs/Xiang_Handsome_w14_lora-000067.safetensors 3D_Chibi_w14_lora-000024.safetensors \
--lora_multiplier 1.0 \
--interactive
```
#### Prompt
```text
"3D Chibi Style ,In the style of Xiang InfiniteYou Handsome, Xiang, a young person with short, black hair and glasses, stands in a quiet office space. The soft glow of a desk lamp casts a warm light across his thoughtful expression, while the hum of distant keyboards and the faint scent of coffee linger in the air. Outside the window, the city lights twinkle like distant stars, blending with the muted glow of computer screens as the workday stretches on around him."
```
-- without 3D_Chibi lora text2video output
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/HWXUohCTVCkhqz69TrKoH.mp4"></video>
-- with 3D_Chibi lora text2image output
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/DXbsg9uHablodhRxzdNdK.mp4"></video>
-- with 3D_Chibi lora text2video output
<!--
python wan_generate_video.py --fp8 --task t2v-14B --video_size 480 832 --infer_steps 10 --video_length 45 \
--save_path save --output_type both \
--dit wan2.1_t2v_14B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight Sebastian_Michaelis_w14_lora-000007.safetensors 3D_Chibi_w14_outputs/3D_Chibi_w14_lora-000024.safetensors Wan21_CausVid_14B_T2V_lora_rank32.safetensors \
--lora_multiplier 1.0 \
--interactive
-->
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/S84IeKdXqQlEbvghV9nvH.mp4"></video>
---
### Example 2: Taiga Aisaka Style
```bash
python wan_generate_video.py --fp8 --task t2i-14B --video_size 480 832 --infer_steps 20 \
--save_path save --output_type both \
--dit wan2.1_t2v_14B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight Taiga_Aisaka_outputs/Taiga_Aisaka_w14_lora-000010.safetensors 3D_Chibi_w14_lora-000024.safetensors \
--lora_multiplier 1.0 \
--interactive
```
#### Prompt
```text
"3D Chibi Style, 一个身穿红色高中校服的金发女孩,正在吃汉堡。"
```
-- without 3D_Chibi lora text2video output
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/fS9xq_yaZjOE_jAt5i8QY.mp4"></video>
-- with 3D_Chibi lora text2image output
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/6REyOMrULVJPgyqVDHqcA.mp4"></video>
---
### Example 3: Sebastian Michaelis (Black Butler) Style
```bash
python wan_generate_video.py --fp8 --task t2i-14B --video_size 480 832 --infer_steps 20 \
--save_path save --output_type both \
--dit wan2.1_t2v_14B_bf16.safetensors --vae Wan2.1_VAE.pth \
--t5 models_t5_umt5-xxl-enc-bf16.pth \
--attn_mode torch \
--lora_weight Sebastian_Michaelis_outputs/Sebastian_Michaelis_w14_lora-000007.safetensors 3D_Chibi_w14_lora-000024.safetensors \
--lora_multiplier 1.0 \
--interactive
```
#### Prompt
```text
"3D Chibi Style, In the style of Black Butler , The video opens with a close-up of a character dressed in a black suit, white shirt, and black tie. stands in a quiet office space. The soft glow of a desk lamp casts a warm light across his thoughtful expression, while the hum of distant keyboards and the faint scent of coffee linger in the air. Outside the window, the city lights twinkle like distant stars, blending with the muted glow of computer screens as the workday stretches on around him."
```
-- without 3D_Chibi lora text2video output
<video controls autoplay src="https://huggingface.co/svjack/Sebastian_Michaelis_wan_2_1_14_B_text2video_lora/resolve/main/20250429-053408_329322199_.mp4"></video>
-- with 3D_Chibi lora text2image output
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/A79XIiwAOuPZhhPTx6A9y.mp4"></video>
---
### Example 4: Turkish Islamic Grand Mufti (Müftü) WanFusionX
```text
3D Chibi Style, a miniature CGI rendering of a venerable Turkish Islamic Grand Mufti (Müftü), blending traditional Ottoman religious attire with mystical symbolism. His chibi-form features an oversized head with large, soulful eyes radiating wisdom beneath a meticulously folded white turban (sarık), a voluminous dark green robe (cübbe) embroidered with gold-thread Quranic calligraphy (Ayat al-Kursi on the collar), and a soft white undershirt. A flowing white beard frames his serene, neutral expression, enhanced by subtle glowing highlights around his eyes to signify inner illumination (kashf) .He stands contemplatively before a minimalist Anatolian prayer niche (mihrab) carved with Seljuk-style geometric patterns, holding an open antique Quran in one hand and a string of amber prayer beads (tesbih) in the other. The background features warm, diffused lighting filtering through a stained-glass window (resembling Istanbul’s Şehzade Mosque designs), casting kaleidoscopic patterns on the beige stone floor. A faint, ethereal mist swirls at his feet, evoking Sufi transcendental practices (dhikr), while a semi-transparent holographic "Seal of Solomon" (Hatem-i Süleyman) symbol floats near his heart, symbolizing divine knowledge (ma'rifa)
```
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/NYTvtU6laz_Efahy6nNvc.mp4"></video>
## Key Parameters
| Parameter | Description |
|----------|-------------|
| `--fp8` | Enable FP8 precision for improved performance |
| `--task` | Set to `t2i-14B` for image generation |
| `--video_size` | Output resolution (e.g., `480 832`) |
| `--infer_steps` | Speed vs quality trade-off (`20` recommended for quick test) |
| `--lora_weight` | Path to LoRA weight files (can specify multiple) |
| `--lora_multiplier` | Strength of LoRA effect (default: 1.0) |
| `--prompt` | Include `"3D Chibi Style"` for best results |
---
## Style Characteristics
For optimal results, prompts should emphasize:
- **Chibi-style characters** with exaggerated heads and facial expressions
- **Vibrant colors** and dynamic lighting effects
- **Fantasy or magical settings** (e.g., gardens, castles, floating islands)
- **Neon or glowing elements**, especially in futuristic or energetic scenes
---
## Output
Generated images will be saved in the specified `--save_path` directory with:
- PNG image file
- (Optional) MP4 video (if `--output_type both` is used)
---
## Troubleshooting
- Ensure all model weights are correctly downloaded and placed in the right directories.
- Check GPU memory availability; at least **20GB VRAM** is recommended for 14B models.
- Verify no conflicts exist between Python packages using `pip check`.
---
## License
This project is licensed under the MIT License.
---
## Acknowledgments
- **Hugging Face** – For hosting the model and dataset repositories
- **Wan-AI** – For providing base diffusion models
- **svjack** – For adapting and sharing LoRA weights for various styles
For support or feedback, please open an issue in this repository. |